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PREFACE TO THE FIRST EDITION 


Tue theory of factorial analysis is mathematical in nature, but 
this book has been written so that it can, itis hoped, be read by 
those who have no mathematics beyond the usual secondary 
school knowledge. Readers are, however, urged to repeat 
some at least of the arithmetical calculations for themselves. 

It is probable that the subject-matter of this book may 
seem to teachers and administrators to be far removed from 
contact with the actual work of schools. I would like 
therefore to explain that the incentive to the study of 
factorial analysis comes in my case very largely from the 
practical desire to improve the selection of children for 
higher education. When I was thirteen years of age and 
finishing an elementary school education, I won a “ scholar- 
ship ” to a secondary school in the neighbouring town, one 
of the early precursors of the present-day “ free places 
in England. I have ever since then been greatly impressed 
by the influence that event has had on my life, and have 
spent a great deal of time in endeavouring to improve the 
methods of selecting pupils at that stage and in lessening 
the part played by chance. It was inevitable that I should 
be led to inquire into the use of intelligence tests for this 
purpose, and inevitable in due course that the possibilities 
of factorial analysis should also come under consideration. 
It seemed to me that before any practical use could be 
made of factorial analysis a very thoroughgoing examina- 
tion of its mathematical foundations was necessary. The 
present book is my attempt at this.... It may seem remote 
from school problems. But much mathematical study and 
many calculations have to precede every improvement 
in engineering, and it will not be otherwise in the future 
with the social as well as with the physical sciences. 

Goprrey H. THOMSON 
Moray HOUSE, 
UNIVERSITY OF EDINBURGH, 
November 1938 


PREFACE TO THE FIFTH EDITION 


In earlier editions since the first, the chief changes in the 
second edition were that the original chapter on Simple 
Structure was expanded into three, to cover oblique 
factors and second-order factors, while Dr. D. N. Lawley 
provided a chapter on factor analysis by maximum like- 
lihood, and a corresponding section in the mathematical 
appendix. The main changes in the third edition con- 
cerned the identity of simple structure factors after 
univariate selection, and the relations between two sets of 
variates. In the fourth, the principal addition was of 
Lawley’s formule for the standard errors of individual 
residuals, and of factor loadings, when maximum likelihood 
methods have been used. 

In the present (the fifth) edition it has for the first time 
been possible to reset the whole book. This has permitted 
more extensive alterations to be made, and the oppor- 
tunity has been taken of rearranging the order of the chap- 
ters and recasting several of them, as well as inserting in 
their proper places in the text those pages which in former 
editions had to be added as appendices. Chapters V, 
VIII, and X will supply the minimum of technique, and the 
remainder of Parts II and III will give in addition a descrip- 
tion of the methods of analysis using principal components, 
using the principle of maximum likelihood, or using 
Thurstone’s Simple Structure. 

Thope, however, that readers will not merely use the book 
as a set of recipes on how to carry out certain computations, 
but will study the geometrical explanations (twelve new 
diagrams have been added): and especially that they will 
ponder the implications of the two chapters, XVIII and 
XIX, on the influence of selection on factors, and the final 
two chapters on the sampling theory and certain funda- 
mental questions. 


Goprrey H. 
University or EDINBURGH, THOMSON 


April 1951 
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All science starts with hypotheses—in other words, 
with assumptions that are unproved, while they may be, 
and often are, erroneous; but which are better than 
nothing to the searcher after order in the maze of pheno- 
mena. 

T. H. HUXLEY 


I am not insensible of the advantage which accrues to 
Applied Mathematics from the co-operation of the Pure 
Mathematician, and this co-operation is not infrequently 
called forth by the very imperfections of writers on Applied 
Mathematics. 

R. A. FISHER 


PART I 


THE TWO-FACTOR THEORY AND ITS 
EXTENSIONS 


v. 4.—1 


CHAPTER I 
/ y THE THEORY OF TWO FACTORS 


2 Factor tests -The object of this book is to give some 

account of the “ factorial analysis” of ability, as it is 
called. In actual practice at the present day this science 
is endeavouring (with what hope of success is a matter of 
keen controversy) to arrive at an analysis of mind based 
on the mathematical treatment of experimental data 
obtained from tests of intelligence and of other qualities, 
and to improve vocational and scholastic advice and 
prediction by making use of this analysis in individual 
cases. It is a development of the “ testing“ movement— 
the movement in which experimenters endeavour to devise 
tests of intelligence and other qualities in the hope of 
sorting mankind, and especially children, into different 
categories for various practical purposes ; educational (as 
in directing children into the school courses for which they 
are best suited); administrative (as in deciding that some 
persons are so weak-minded as to need lifelong institutional 
care); or vocational, ete. 

There are many psychologists who would deny that from 
the scores in such tests, or indeed from any analysis, we 
can (ever) return to a full picture of the individual ; and 
without entering into any discussion of the fundamental 
controversy which this denial reveals, everyone who has 
had anything to do with tests will readily agree that this 
is certainly so at present in practice. But the tester may 
be allowed to try to make his modest diagram of the 
individual better, more useful, and if possible simpler. 

Now, the broadest fact about the results of “ tests“ of 
all sorts, when a large number of them is given to a large 
number of people, is that every individual and every test 
is different from every other, and yet that there are certain 
rather vague similarities which run through groups of 
people or groups of tests, not very well marked off from 
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one another but merging imperceptibly into neighbouring 
groups at their margins. To describe an individual ac- 
curately and completely one would have to administer to 
him all the thousand and one tests which have been or 
may be devised, and record his score in each, an impossible 
plan to carry out, and an unwieldy record to use even if 
obtained. Both practical necessity and the desire for 
theoretical simplification lead one to seek for a few tests 
which will describe the individual with sufficient accuracy, 
and possibly with complete accuracy if the right tests can 
be found. If, as has been said, there is some tendency 
for the tests to fall into groups, perhaps one test from each 
group may suffice. Such a set of tests might then be said 
to measure the“ factors“ of the mind. 
“ factorial ” movement has been rather different, and the 
factors are not real but as it were fictitious tests which 
represent certain aspects of the whole mind. But con- 
ceivably it might haye taken the more concrete form. In 
that case the “factor tests” finally decided upon (by 
whom, the reader will ask, and when“ finally ?) would 
be a set of standards which, like any other standards, would 
have to be kept inviolate, and unchanged except at rare 
intervals and for good reasons. Some tendency towards 
this there has been. The Binet scale of tests is almost an 
international standard, and there is a general agreement 
that it must not be changed except by certain people upon 
whose shoulders Binet’s mantle has fallen, and only seldom 
and as little as possible even by them. But the Binet 
scale is a very complex entity, and rather represents many 
groups of tests than any one test. By “ factor tests“ one 
would more naturally mean tests of a “ pure ” nature, 
differing widely from one another so as to cover the whole 
personality adequately. And since actual tests always 
are more or less mixed, it is understandable why “ factors ” 
have come to be fictitious, not real, tests, to be each 
approximated to by various combinations of real tests so 
weighted that their unwanted aspects tend to cancel out, 
and their desired aspects to reinforce one another, the team 
approximating to a measure of the pure“ factor.” 
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But how, the reader will ask, do we know a “ pure ” 
factor, how are we to tell when the actual tests approximate 
toit? To give a preliminary answer to that question we 
must go back to the pioneer work of Professor Charles 
‘Spearman in the early years of this century (Spearman, 
1904). The main idea which still, rightly or wrongly, 
dominates factorial analysis was enunciated then by him, 
and practically all that has been done since has been either 
inspired or provoked by his writings. His discovery was 
that the “ coefficients of correlation“ between tests tend 
to fall into “ hierarchical order,” and he saw that this 
could be explained by his famous “ Theory of Two Factors.” 
These technical terms we must now explain. 

8. Hierarchical order.—A coefficient of correlation is a 
number which indicates the degree of resemblance between 
two sets of marks or scores. If a schoolmaster, for example, 
gives two examination papers to his class, say (1) in arith- 
metic and (2) in grammar, he will have two marks for every 
boy in the class. If the two sets of marks are identical 
the correlation is perfect, and the correlation coefficient, 
denoted by the symbol vie is said to be 4-1. If by some 
curious chance the one list of marks is exactly like the 
other one upside down (the best boy at arithmetic being 
worst at grammar, and so on), the correlation is still perfect, 
but negative, and r = — 1. If there is absolutely no 
resemblance between the two lists, 7, = 0. If there is a 
strong resemblance, but falling short of identity, ry, may 
equal -9; and so on,’ There is a method (the Bravais- 
Pearson) of calculating such coefficients, given the list of 
marks.* „Tests“ can obviously be correlated just like 

* The “ product-moment formula“ is— 

sum (ay) 
= {sum (2,2) x sum (ay!) 
where a, and @ are the scores in the two tests, measured from the 
average (so that approximately half the seores are negative), and 
the sums are over the persons to whom the scores apply. ‘The 
quantity— 


Tia 


sum () 
number of persons 
is called the variance of Test 1, and oi its standard deviation. If the 
scores in each test are not only measured from their average, but 


2 
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examinations, and a convenient form in which to write 
down the intercorrelations of a number of tests is in a 
square chequer board with the names of the tests (say 
a, b,c . .) written along the two margins, thus: 


a b c d e 55 
a P 48 24 54 42 30 
y b 48 ‘ 32 7 56 40 
c 24 32 8 36 28 20 
d 54 772 36 5 63 4 
e 42 56 28 -68 ; 35 
f 30 40 20 45 35 
Totals 1-98 248 140 270 224 1-70 


It was early found that such correlations tend to be 
positive, and it is of some interest to see which of a number 
of tests correlates most with the others. This can be found 
by adding up the columns of the chequer board, when we 
see in the above example that the column referring to 
Test d has the highest total (2-70). The tests can then be 
rearranged and numbered in the order of these totals, thu s: 


1 2 3 4 5 6 

d b e a f c 
1d | 5 de 
N 56 48 40 32 
9 63 56 * 42 35 28 
4 a 34 48 42 A 30 24 
5 45 40 35 30 . 20 
6 c | 86 32 28 24 20 


After the tests have been thus arranged, the tendency 
which Professor Spearman was the first to notice, and which 


are then divided through by their standard deviation, they are said 
to be standardized, and we represent them by 21 and Za About 
two-thirds of them, then, lie between plus and minus one, With 
such scores Pearson’s formula becomes— 
sum of the products 2,2, 

number of persons p. 

In theoretical work, an even larger unit is used, namely oyp. 
With these units, the sum of the Squares is unity, and the sum of the 
products is the correlation coefficient. The scores are then said to 


be normalized, but note that this does not mean distributed in a 
normal ” or Gaussian manner, 


Ty = 
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he called“ hierarchical order,“ is more easily seen. It is 
the tendency for the coefficients in any two columns to haye 
a constant ratio throughout the column. Thus in our 
example, if we fix our attention on Columns a and f, say, 
they run (omitting the coefficients which have no partners) 
thus: ý 


48 40 
42 35 
24 20 


and every number on the right is five-sixths of its partner. 
on the left. 

Our example is a fictitious one, and the tendency to 
hierarchical order in it has been made perfect in order to 
emphasize the point. It must not be supposed that the 
tendency is as clear in actual experimental data. Indeed, 
at the time there were some who denied altogether the 
existence of any such tendency in actual data. Those who 
did so were, however, mistaken, although the tendency is 
not as strong as Professor Spearman would seem originally 
to have thought (Spearman and Hart, 1912). The follow- 
ing is a small portion of an actual table of correlation coeffi- 
cients* from those days (Brown, 1910, 809). (Complete 
tables must, of course, include many more tests ; in recent 
work as many as 57 in one table.) 


(isd e 
% %» ⁰d(qö DOESN 
i „e 2 ei 
3 4 48 32 4% 28 
44 7 % 2% 5 % 258 
5 59 51 4% 41 18 
„% % 24 -88 88 1s 


* In this, as in other instances where data for small examples are 
taken from experimental papers, neither criticism nor comment is 
in any way intended. Illustrations are restricted to few tests for 
economy of space and clearness of exposition, but in the experiments 
from which the data are taken many more tests are employed, and 
the purpose may be quite different from that of this book. 
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. G saturations.—This tendency to “ hierarchical order” 

as explained by Professor Spearman by the hypothesis 
that all the correlations were due to one “ factor“ only, 
present in every test, but present in largest amount in the 
test at the head of the hierarchy. This factor is his famous 
“ g,” to which he gave only this algebrai name to avoid 
making any suggestions as to its nature, although in some 
papers and in The Abilities of Man he permitted himself 
to surmise what that nature might be. Each test had also 
a second factor present in it (but not to be found elsewhere, 
except indeed in very similar varieties of the same test), 
whence the name, “ Theory of Two Factors really one 
general factor, and innumerable second or specific factors. 
It will be proved in the Mathematical Appendix* that 
this arrangement would actually give rise to “ hierarchical 
order.“ Meanwhile this can at least be made plausible. 
For if Test d has that column of correlations (the first 
in our table) with the other tests solely because it is 
saturated with so-and-so much g ; and if Test b has less g 
in it than d has, it seems likely enough that b’s column of 
correlations will all be smaller in that same proportion. 
We can, moreover, find what these “ saturations ” with g 
are. For on the theory, each of our six tests contains the 
factor g, and another part which has nothing to do with 
causing correlation. Moreover, the higher the test is in 
the hierarchical ranking, the more it is “ saturated ” with g. 
Imagine now a fictitious test which had no specific, a test 
for g and for nothing else, whose saturation with gis 100 per 
cent., or 1:0. This fictitious test would, of course, stand 
at the head of the hierarchy, above our six real tests, and 
its row of correlations with each of those tests (their 
“saturations ”) would each be larger than any other in the 
same column. What values would these saturations take ? 
Before we answer this, let us direct our attention to the 
diagonal cells of the “matrix ” of correlations (as it is 
called—a matrix is just a square or oblong set of numbers), 
cells which we have up to the present left blank. Since 
each number in our matrix represents the correlation of the 
Se gia Me 1 b and row it stands, there should 
pter xviii, end of Section 6, page 283. 
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e 2 3 4 5 6 

£ 1 | 1 . S Tig To 
1 Ty a 72 63 54 45 36 
2 | Tog 72 s 56 48 40 32 
8 E 63 56 . 42 35 28 
4 | Tig 54. 48 42 ` 30 24 
5 | Tsp 45 40 35 30 . 20 
6 Toy 36 32 28 24 20 8 


be inserted in each diagonal cell the number unity, repre- 
senting the correlation of a test with its own identical self. 
In these self-correlations, however, the specific factor of 
each test, of course, plays its part. These self-correlations 
of unity are the only correlations in the whole table in 
which specifies do play any part. These “ unities,” there- | 
fore, do not conform to the hierarchical rule of propor: | 
tionality between the columns. 

But the case is different with the fictitious test of pure g. 
It has no specific, and its self-correlation of unity should 
conform to the hierarchy. If, therefore, we call the 
“ saturations ” of the other tests figs ogs Tags Tags Tsg and Teg 
we see that we must. have, as we come down the first two 
columns within the matrix— 

Yr, 72 68 54% :86 
1 i e age r 
and similar equations: for each other column with the g 
column, which together indicate that the six “ saturations ” 
3 9 8 7 6 5 4 
Furthermore, each correlation in the table is the product 
of two of these saturations. Thus— 
7d X 18 
42 = 7 X 6 
Tad = Tay X Tag 
The six tests can now be expressed in the form of 


equations : 21 = 9g + 43661 
za = "8g + "60083 


23 17 + 7143 
2, 62 + 8008, 
25 = ‘5g + 86665 
26 = "Ag + 91786 


F. A.—1“* 
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Herein, each z represents the score of some person in the 
test indicated by the subscript, a score made up of that 
person’s g and specific in the proportions indicated by the 
coefficients. The scores are supposed measured from the 
average of all persons, being reckoned plus if above the 
average and minus if below; and so too are the factors g 
and the specifies. And each of them, tests and factors, is 
“ standardized,” i.e. measured in such units that the sum 
of the squares of all the scores equals the number of 
persons. This is achieved by dividing the raw scores by the 
“standard deviation.” The saturations of the specifics 
are such that the sum of the squares of both saturations 
comes in each test to unity, the whole variance of that test. 


* 436 = I — 95 


5. A weighted battery.—This brief outline of the Theory 
of Two Factors must for the moment suffice. It is 
enough to enable the question to be answered which at the 
end of our Section 2 led to the digression. (“ How,” the 
reader asked, “ do we know a pure factor, how are we to 
tell when the actual tests approximate to it?” In the 
Two-factor Theory the important pure factor was g itself, 
and a test approximated to it the more, the higher it stood 
in the hierarchy. Its accuracy of measurement of g was 
indicated by its “saturation.” And a battery of hier- 
archical tests could be weighted so as to have a combined 
saturation higher than that of any one member, each test 
for this purpose being weighted (as will be shown in Chapter 


XV) by a number proportional to 7 =» where 7, is the 
190 


g saturation of Test i (Abilities, p. xix). The battery 
saturation or multiple correlation with g is then— 


S 
1+5 
y, 2 
where S = TY 
e Ea 


Although g remained a fiction, yet a complex test, made up 
of a weighted battery of tests which were hierarchical, 
could approach nearer and nearer to measuring it exactly, 


— 
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as more tests were added to the hierarchy. Each test added 
would have to conform to the rule of proportionality in its 
correlations with the pre-existing battery. If it did not 
do so it would have to be rejected. The battery at any 
stage would form a kind of definition of g, which it ap- 
proached although never reached. And a man’s weighted 
score in such a battery would be an estimate of his amount 
of g, his general intelligence. The factorial description of 
a man was at this period confined to one factor, since the 
specific factors were useless as description of any man. 
For one thing, they were innumerable ; and for another, 
being specific, they were only able to indicate how the man 
would perform in the very tests in which, as a matter of 
fact, we knew exactly how he had performed.“ 

\ 6. Oval diagrams.—It is convenient at this point to 
introduce a diagrammatic illus- 


tration which will be useful in the — 
less technical part of this book, Ze 
although like all illustrations it A 


must be taken only as such, and the 
analogy must not be pushed too far. 
If we represent the two abilities, 8 

which are measured by tests, by a») 
two overlapping ovals as in EF 
Figure 1, then the amount of the 
overlap can be made to represent 

the degree to which these tests are I. 
correlated. If we call the whole 
area of each oval the “ variance“ 
of that ability, we shall be intro- 
ducing the reader to another 
technical term (of which a de- 
finition was given in the footnote 
to page 5). Here it need mean 
nothing more than the whole 
“amount ” of the ability. The 
overlap we shall call the “ covariance.” If the two 
variances are each equal to unity, then the covariance is 
the correlation coefficient. To make the diagram quantita- 
tive, we can indicate in figures the contents of each part of 


2 


Y 
\| 


ASN 
Up 


N 


N 


4. 
Figure 3. 
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the variance, as in the instance shown, which gives a 
correlation of 3%, or ‘6. If the separate parts of each 
variance (i.e. of each oval) do not add up to the same 
quantity, but to vı and va, say, then the covariance (the 
amount in the overlap) must be divided by Woite in order 
to give the correlation. Thus, Figure 2 represents a 
correlation of 3 = 4/(4 x 9) = :5. No attempt is made 
in the diagrams to make the actual areas proportional to the 
parts of the variance, it is the numbers written in each cell 
which matter. 

The four abilities represented by four tests can clearly 
overlap in a complicated way, as in Figure 8, which shows 
one part of the variance (marked g) common to all four of 
the tests ; four parts (left unshaded) each common to three 
tests; six parts (shaded) each common to two tests ; and 
four outer parts (marked s) each specific to one test only, 
The early Theory of Two Factors adopted the hypothesis 
that, except for very similar varieties of the one test, none 
of the cells of such a diagram had any contents save those 
marked g and s, the general and the specific factors. The 
variance of each ability was in that theory completely 
accounted for by the variance due to g, and the yariance 
due to s. 

vg 7. Tetrad-differences—In Section 8 it was explained that 
the discovery made by Professor Spearman was that the 
(correlation coefficients in two columns tend to be in the 
Same ratio as we go up and down the pair of columns. 
That is to say, if we take the columns belonging to Tests 
b and f, and fix our attention on the correlations which 

b and f make with d and e, we have: 


. 

d 72 45 

E 56 35 
where 22 56 
45 35 


This may be written — 
72 * 35 — 45 x 56 = 0 


+ 
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and in this form is called a “ tetrad-difference.” In 
symbols this one is— 
rn raf = 0 

Spearman's discovery may therefore be put thus: “ The 
tetrad-differences are, or tend to be, zero.“ It is clear that 
this will be so if, as we said was the case in the Theory of 
Two Factors, each correlation is the product of two cor- 
relations with g. For then the above tetrad-difference 
becomes— 3 Se 
Tagog eo" fo — Vag? to" ea"bg ) 
which is identically zero. The present-day test for hier- 
archical order in a correlation matrix is to calculate all the 
tetrad-differences (always avoiding the main diagonal) and 
see if they are sufficiently small. If they are, then the 
correlations can be explained by a diagram of the same 
nature as Figure 8, by one general factor and specifies. It 
is, of course, not to be expected in actual experimenting 
that the tetrad-differences will be exactly zero; no experi- 
ment on human material can be as accurate as that. What 
is required is that they shall be clustered round zero in a 
narrow curve, falling off steadily in frequency as zero is 
departed from: The number of tetrad-differences increases 
very rapidly as the number of tests grows, and in an actual 
experimental battery the tetrads are very numerous indeed. 
In the small portion of a real correlation table given above 
(page 7), with six tests, there are 45 tetrad-differences,* 
and in this instance they are distributed as follows (taking 
absolute values only and disregarding signs, which can be 
changed by altering the order of the tests) : 


From :0000 to -0999, 28 tetrad-differences. 
From -1000 to 1999, 18 tetrad-differences. 
From -2000 to 2796, 4 tetrad-differences. 


This distribution of tetrads can be represented by a 
“histogram ” like that shown in Figure 4, which explains 
itself. It is clear that some criterion is required by which 
we can know whether the distribution of tetrad-differences, 
after they have been calculated, is narrow enough to justify 
us in assuming the Theory of Two Factors. This criterion 


* Not all independent. 
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is explained in Chapter III, page 41. One form of it con- 
sists in drawing a distribution curve to which, on grounds 
of sampling, the tetrad-differences may be expected to con- 
form. Any tetrad-differences which seem to be too large 
to be accounted for by the Theory of Two Factors are then 
examined, to see whether the tests giving them have any 


Figure 4. 


special points of resemblance, 
in content, method, or other- 
wise, which may explain why 
they disturb the hierarchy. 

8. Group factors.—As time 
went on it became clear that 
the tendency to zero tetrad- 
differences, though strong, was 
not universal enough to permit 
an explanation of all correla- 
tions between tests in terms of 
g and specifies, with a few 


slight “ disturbers ° in the form of slightly overlapping 
specifics. It became necessary to call in group factors, 
which run through many though not through all tests, 
to explain the deviations from strict hierarchical order. 
The Spearman school of experimenters, however, tend 
always to explain as much as possible by one central 
factor, and to use group factors only when necessitated. 
They take the point of view that a group factor must, as 
it were, establish its right to existence, that the onus of 
proof is on him who asserts a group factor. As a tiny 
artificial illustration, a matrix of correlation coefficients : 


1 


2 3 4 
5 5 5 
à 8 5 
8 5 5 
5 5 


would be examined, and its three tetrad-differences found 


to be: 


zero 
15 
15 
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Inspection shows that the correlation rə is the cause of 
the discrepancies from zero, and the experimenter trained 
in the Two-factor school would therefore explain these 
correlations by a central factor running through them all, 
plus a special link joining Tests 2 and 3, as in Figure 5. 

There are innumerable other possible ways of explaining 
these same correlations. For 
example, the linkages between 
the tests might be as in Figure 6, 
which gives exactly the same cor- 
relations. This lack of unique- 
ness is something which must 
always be borne in mind in study- 
ing factorial analysis. There are 
always, as here, innumerable 
possible analyses, and the final 
decision between them has to be 
made on some other grounds. 
The decision may be psycho- 
logical, as when for example in 
the above case an experimenter 
chooses one of the possible dia- 
grams because it best agrees with 
his psychological ideas about the 
tests. Or the decision may be 
made on the ground that we 
should be parsimonious in our Tiene 6: 
invention of “ factors,” and that 
where one general and one group factor will serve we should 
not invent five group factors as required by Figure 6. 
Both diagrams, however, fit the correlational facts exactly, 
and so also would hundreds of other diagrams which might 
be made. As has been said, the two-factor tendency is to 
take the diagram with the largest general factor (and the 
largest specifics also) and with as few group factors as 
possible. 

9. The verbal factor. — In this way the Theory of Two 
Factors has gradually extended the “ two” to include, in 
addition to g and specifics, a number of other group factors, 
still, however, comparatively few. These group factors 
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bear such names as the verbal factor v, a mechanical factor 
m, an arithmetic factor, perseveration, ete.) The charac- 
teristic method of the Two-factor school can be well 
seen, without any technical difficulties unduly obscuring 
the situation, in the search for a verbal factor. The idea 
that, in addition to a man’s g (which is generally thought 
of as something innate) there may be an acquired factor 
of verbal facility which enables him to do well in certain 
tests, is a not unnatural one. vq battery of tests can be 
assembled of which half do, and half do not, employ words 
in their construction or solution. The correlation matrix 
will then have four quadrants, the quadrant V containing 
the correlations of the verbal tests among themselves, the 


quadrant P the correlations of the non-verbal or, say, 
pictorial tests, and the quadrants C containing the cross- 
correlations of the one kind of test with the other. If the 
whole table is sufficiently “hierarchical,” there is no 
evidence for a group factor v or a group factor I 
either of these factors exists, there will be differences to be 
noticed between the six kinds of tetrad which can be 
chosen, namely: 


PERY 2 Pp p 
v oo v| @ @ p|\ E a 
(1) (2) (3) 
v ia OTE ow p|\|« a 
UEP. v p v p 
v De xe p TEE a 0 [ æ ST 
(4) (5) (6) 
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A tetrad like 1, with two verbal tests along one margin 
and two pictorial tests along the other, will be found in 
quadrant C. Neither a factor common to the verbal tests 
only, nor one common to the pictorial tests only, will add 
anything to any of the four correlations in such a tetrad- 
difference, which may be expected, therefore, to tend to be 
zero. If the tetrads in C seem to do so, the other tetrads 
can be examined. Tetrad 2 is taken wholly from the V 
quadrant. In it the verbal factor, if any is present, will 
reinforce all the four correlations, and should not therefore 
disturb very much the tendency to a zero tetrad-difference. 
(Reinforced correlations are marked by æ in the diagrams.) 
The same is true of Tetrad 3 taken wholly from the P 
quadrant. Tetrads 4 and 5 have each two of their cor- 
relations reinforeed, by the v factor in 4 and by the p 
factor in 5, but in each case in such a way as not to change 
very much the tetrad-difference. It is when we come to 
tetrads like 6, which have one correlation in each of the 
four quadrants, that the presence of either or both factors 
should show itself strongly : for the two reinforced correla- 
tions here occur on a diagonal, and inflate only the one 
member of the tetrad-difference— 

Teo pp — Top" po 

If, then, a verbal factor, and also a pictorial factor, are 
present, the tendency for the tetrad-differences to vanish 
should become less and less strong as we consider tetrads . 
of the kinds 1, 2 and 8, 4 and 5, and especially 6, where 
the tetrad-differences should leap up. If only the verbal 
factor is present, tetrad-differences of the kind 8 should 
vanish rather more than those of the kind 2. But it will 
not be easy to distinguish between either suspected factor, 
and both. Tetrads like 6, however, should give conclusive 
evidence of the presence of one or the other, if not both. 
Methods like this were employed by Miss Davey (Davey, 
1926), who found a group factor, but not one running 
through all the verbal tests, and by Dr. Stephenson 
(Stephenson, 1931), whose results indicated the presence 
of a verbal factor.* 

* T, L. Kelley had already found by other methods strong evidence 
of a verbal factor (Kelley, 1928, 104, 121 et passim). 
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1 10. Group-factor saturations.—Just as the g saturations 
of tests can be calculated, so also can the saturation of a 
test with any group factor it may contain. The general 
method of the Two-factor school is first to work with 
batteries of tests which give no unduly large tetrad- 
differences, and which also appear to satisfy one’s general 
impression that they test intelligence. From such a 
battery, of which the best example is that of Brown and 
Stephenson (B. and S., 1933), the g saturations can be 
calculated.“ Each test has, however, also its specific, 
which, so long as it is in the hierarchical battery, is unique to it 
and shared with no other member of the battery. A test 
may now be associated with some other battery of different 
tests, and with some of these it may share a part of its 
former specific, as a group factor which will increase its 
correlation beyond that caused by g. The excess correla- 
tion enables the saturation of the test with this group 
factor to be found—the details are too technical for this 
chapter—and_ the specifice saturation correspondingly 
reduced, Finally, the tester may be able to give the 
composition of a test as, let us say (to invent an example)— 
‘Tig + 40 + ‘34n + -47s 
where g is Spearman’s g, v is Stephenson’s verbal factor, 
n is a number factor, and s is the remaining specific of the 
test. The coefficients are the “saturations ” of the test 
with each of these; that is, the correlations believed to exist 
between the test and these fictitious tests called factors. 
The squares of these saturations represent the fractions of 


the test-variance contributed by each factor, and these 
squares sum to unity, thus : 


Saturation Squared 


g 5041 
v ` 1600 
n 5 1156 
s : A 2209 

1:0006 


: * For the sake of clarity the text here rather oversimplifies the 
situation. The battery of Brown and Stephenson contains in fact 
a rather large group factor as well as g and specifics, 


CHAPTER II 
BIFACTOR ANALYSIS AND CLUSTERS 


1. The bifactor method.—Holzinger’s Bifactor Method 
(Holzinger, 1935, 1937a) may be looked upon as another 
natural extension of the simple Two-factor plan of analysis. 
It endeavours to analyse a battery of tests into one general 
factor and a number of mutually eaclusive group factors. 
A diagram of such an analysis looks like a “ hollow stair- 
case,” thus : ; 


Test g h k l 
1 x x 
2 z os 
3 x x 
4 x x 
8 x 
6 x x 
y x x 
8 x x 
8 x 


Here factor g runs through all, as is indicated by the 
column of crosses. Factors h, k, and l run through mutu- 
ally exclusive groups of tests each. The saturations with 
g can be calculated from sub-batteries of tests which form 
perfect hierarchies, by selecting only one test from each 
group (in every possible way). After these are known, 
the correlation due to g can be removed, and then the 
saturations due to each group factor found. 

The following artificial example will illustrate some of 
the points of this method. Consider these correlations, 
which to save space are printed without their decimal 
points : 

19 
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9 10 11 12 

1 57 40 45 63 63 20 28 74 52 45 34 

2 57 84 25 58 89 17 44 68 43 39 56 

8| 40 34 18 57 27 59 16 44 70 78 20 

4| 45 25 18 27 51 09 12 82 22 20 15 

5 68. 58-57 27 42 40 26 68 67 63 31 

6| 68 89 27 51 42 18 18 50 34 30 23 

7| 20 17 59 09 40 18 08 22 60 64 10 

8| 28 44 16 12 26 18 08 85 21 18 43 

9| 74 68 44 82 68 50 22 35 56 50 44 

10 52 48 70 22 67 84 60 21 56 78 25 
11| 45 89 78 20 68 80 64 18 50 78 23 


12| 84 56 20 15 31 28 10 48 44 25 23 


There are two stages in a bifactor analysis. The first 
problem is to decide how to group the tests so that those 
are brought together which share a second or group factor. 
Then the best method of calculating is needed to find the 
loadings. 

The grouping can partly be done subjectively by con- 
sidering the nature of each test and putting together 
memory tests, or tests involving number, and so on. 
Holzinger uses a “ coefficient of belonging,” B, to determine 
the coherence of a group. B is equal to the average of the 
intercorrelations of the group divided by their average 
correlation with the other tests in the battery. The higher 
B is, the more the group is distinguishable as a group. 
He begins with a pair of tests which correlate highly with 
one another, and finds their B. Then he adds a third test 
and finds the B of the three. Then another and another, 
until B drops too low. There is no fixed threshold for B, 
but a rather sudden drop would indicate the end of a 
group. 

2. Tryon’s grouping. Another plan is to make a graph 
or profile of each row of correlations and compare these 
(Tryon, 1989), grouping together those tests with similar 

Profiles. I find it easier to consider only the peaks of each 
row and compare the rows with regard to these. If we 
mark, in each row of the above, the five highest correlations 
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in that row, and also the diagonal cell, we get the following 
set of peaks: 


1 3 1 5 „ » ls te 
1 x * 8 
2 x 8 x 
3 x X x 8 
4 Ca N X 
F x X 3 
Eix OX LAX x 
70 * * x 8 
8 2 * x =x x 
SEN KTA e at A Gra 
10 x x x Ses a 
11 x x x Ren erates 
1 x 8 x 


We then see that, in the rows, 


(a) Tests 3, 7, 10, 11 have identical peaks, 

(b) „ AN 75 25 

(e) 22 4, 6 9 39 39 
and we take these as nuclei for three groups. There re- 
main Tests 1, 5, and 9. Their average correlations with 
each of the above nuclei are : 


1 39 40 54. 
5 57 37 35 
9 43 49 41 


We therefore add Test 1 to group ¢, Test 5 to group 4, 
and (less certainly) Test 9 to group b. We then rewrite 
our matrix with the tests thus grouped (see next page) : 

It will be seen that certain additions have been made in 
readiness for the various methods of calculation of the g 
loadings which are then possible. If we symbolize the 


table overleaf as 
‘ 9067 
L 2 
S / Deptt of Extensio® 25 \ 


2 
A Services. PA 
© 


D E 
Dies F 
E F C 


An 5 
ee 


fA 
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3 710 11 2 8 9 12 1 4 6 


3 57 59 70 73 34 16 44 20 | 1-14) 40 18 27 
8587 40 67 63 | 58 26 68 31 | 1-78 63 27 42 1-32 
7 59 40 60 64 17 08 22 10 -57| 20 09 13 

0 | 70 67 60 78 43 21 56 25 | 1-45 | 52 22 34 1-08 
1 | 78 63 64 78 | | 39 18 50 23 | 1-30 | 45 20 30 | -95 


| 6-24 1-62 


2 | 34 58 17 48 30 |1-86| 44 68 56 5 

8 | 16 2608 21 18 | -89| 44 9 43 28 
9 | 44 68 22 56 50 2.40 68 35 44 74 32 
2 20 31 10 25 23 100 56 43 4 34 15 2 


40 63 20 52 45 | 2-20 | 57 28 74 34 | 1:93 | 45 63 
18 27 09 22 20 | -96 | 25 12 32 15 84 45 51 
27 42 13 34 30 | 1-46 | 39 18 50 23 1-30 63 51 


ony 


all methods depend on using only the correlations in the 
rectangles D, E, and F, since the suspected group factors 
which increase the correlations in A, in B, and in C do not 
influence D, E, and F. Each correlation in the latter 
rectangles is therefore the product of two g-saturations 
(see page 9). Thus: 

fa 40 = Isl, 

Tss = 34 = Isl, 

72 = 5 Els 

40 X +34 

57 


E= = 24, 1, = -49 . 
where it should be noted that the three correlations come 
from E, D, and F respectively. 

But this value for the loading of Test 3 depends upon three 
correlations only and would, in a real experimental set of 
data, vary somewhat with our choice of the three. A 
method of using all the possible correlations in these three 


rectangles is needed. One such is given by Holzinger in 
his Manual (1937), 
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3. Holzinger’s formula.—If all possible ways of choosing 


* formed 


the two other tests are taken, and the fraction A 
‘ ij 
in each case; and if the numerators of these fractions are 
added together to form a global numerator, and their 
denominators to form a global denominator ; it will then 
be found that the fraction thus formed is equal to 
p 14 85 
8 
and this time all available correlations have been used. 
The rule is to multiply the two totals in the row of the 
test (1-14 x +85) and divide by the grand total of the 
block formed by the other tests concerned (1, 4, and 6 
with 2, 8, 9, and 12, i.e. 4:07). For Test 2 this rule gives 


1:86 x 1-21 
R= PFF 
4-62 
This Holzinger method is not difficult to extend to four 
or more groups. If we symbolize a four-group matrix by 


A D E G 


= 24, ls = -49 


G EE 
and consider the first test, then its g-loading lis given by 
„ 
F+H+K 
where d, e, and g are the sums of its row in D, E, and G. 
4. Burt’s formula.—Another method is given by Burt 
(1940, 478). For the numerator of each g loading he takes 
the swn of the side totals which Holzinger multiplied. 
Thus the numerators are : 
for Test 3, 1-14 + :85 = 1-99 
„ „ö By 1.78 + 1-82 = 8-10 
2, 1:86 + 1:21 = 8:07 
12, 1-09 + -72 = 1:81 


55 2 
55 3 


6, 1-46 + 1:30 = 2-76. 


55 9 
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The denominators differ in group a, group b, and grou pe, 
but all are formed from the three quantities 6-24, 4-62, 
and 4:07. For group a the denominator is: 


6-24 4-62 
4 == —— } = 4-08, 
Vv orf 62 T 21 


It will be seen that the two quantities within the curly 
brackets are the totals of D and E, the two rectangles 
from which the numerators of group a come. By analogy 
the reader can write down the denominators of group b 
and group they come to 4.40 and 5-01. Dividing the 
numerators by the appropriate denominators, we get for 


the g loadings : 


Test ee 2 1 4 6 
g Loading 49 :76 24 -62 -55 -70 -83 -90 -41 -82 -36 -55 


The proof of Burt’s formula is surprisingly easy. If the 
reader will write down, in place of the correlations in D, 
E, and F, the literal symbols ll, (for v) since our 
hypothesis is that only g is concerned in these correlations 
—and will write out the sums, etc., of the above calculation 
literally, he will find that Burt’s formula simplifies almost 
immediately to one J, that of the test in question. Burt 
only gives his formula for three groups. It can be extended 
to the case of more groups, but becomes cumbersome and 
rather unwieldy. 

5. The test of correct grouping. Now comes the test of 
whether our grouping is correct, and our hypothesis valid 
that groups a, b, and e have nothing in common but the 
factor g. Using the loadings we have found, form all the 
products JJ, and subtract them from the experimental 
correlations. All the correlations in D, E, and F should 
then vanish or, in a real set of data (ours are artificial), 
become insignificant. There should, however, remain 
residues in A, B, and C due to the second factors running 
through groups a, b, and e respectively. In our example 
the subtraction of the quantities /,J, gives the residues 
shown at the top of page 25. 

The correlations left in A, if they are due to only one 
other factor (now that g has been removed), ought to show 
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By „ ‚ ‚ ie Y 
g Loadings 49 76 24 62 55 70 33 90 41 82 36 55 


3 49| 20 47 40 46 | 
5 76 20 22 20 21 
7 24 47 22 45 51 
10 62 40 20 45 44 
11 55 46 21 51 44 


2 7 23 63 29 

8 33 23 30 14 

9 90 68 80 37 

12 41 29 14 37 

1 82 15 18 
4 36 | 15 31 


6 55 | | 18 31 
zero or very small tetrads ; and so they do. Those in B 
are also hierarchical. Those in C are too few to form a 
tetrad. The second factor in each of these submatrices 
can now be found in the same way as g is found from a 
matrix with no other factor: see page 9 and, later in this 
book, pages 42 to 44. The reader should complete the 
calculation, and will find these loadings : 


Factors 
Test g u v w 


3 49 65 

5 76 30 

7 24 72 j 
10 62 62 a 


11 55 71 . 
2 70 . 44. 
8 33 . 47 
9 90 8 pila 

12 41 $ 62 š 
1 82 A : 29 
4 36 . $ 50 
6 55 . . 62 
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An actual set of data will not give so perfect a hollow 
staircase, but at this stage the strict bifactor hypothesis 
can be departed from and additional small loadings or 
further factors added to perfect the analysis. Where a 
bifactor pattern exists, a simple method of extracting 
correlated or oblique factors has been given by Holzinger 
(1944) “based on the idea that the centroid pattern 
coefficients for the sections of approximately unit rank 
may be interpreted as structure values for the entire 
matrix.” 

6. Cluster analysis.—This is connected with the bifactor 
method, which is possible when clusters do not overlap. 
But it is by no means rare to find two or three variables 
entering into several distinct clusters. Raymond Cattell’s 
article (1944a) describes four methods of determining 
clusters, and gives references which will lead the interested 
reader back to much of the previous work, and see also 
Tryon’s work Cluster Analysis, 1939. The most naive 
method of classifying tests into clusters, one needing no 
mathematics whatever, is simply to put together all the 
tests which intercorrelate above a certain level. We can 
illustrate this adequately on the above example. Let us 
collect into clusters tests which correlate with one another 
at least 0-40. A routine is desirable to ease the task and 
avoid overlooking any clusters. Turn to the table on 
page 20 and write down from the first row all the tests 
which have correlations of 0-40 or more with Test 1, 
including itself. 


e 10 11 
2 5 9 10 
5 9 10 
9 10 


Cluster A, Tests 1, 2, 5, 9, 10. 


Then consider the test next to No. 1 in this line, which 
happens to be Test 2, and go along its line in the correlation 
table to see which of the tests already noted also correlates 
sufficiently with Test 2. They are 5, 9, and 10. The 
other tests of our first line drop out, We then look along 
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the line of Test 5’s correlation coefficients, and find that 
Tests 9 and 10 survive this scrutiny. Finally, we note 
that Tests 9 and 10 themselves correlate enough. The 
cluster A is therefore (reading down the left-hand edge of 
the above triangular set of notes) composed of Tests 1, 2, 
5,9, and 10. At this point, to avoid missing other clusters 
which may begin with Test 1, it is necessary to consider 
what would have happened had Test 2 not been in the 
battery. It would be tedious to describe the whole pro- 
cedure here, but the reader is urged to go through it, when 
he will find six clusters, shown in this diagram. 


Figure 7. 


7. Comparison with the bifactor growps.—If we compare 
these clusters with the grouping we found by *Tryon’s 
method of profiles (or peaks), we see that our present clusters 
F, E, and C are those we arrived at formerly (except for the 
absence of Test 9 from cluster E). And we notice also 
that in our diagram these are mutually exclusive clusters. 
The missing Test 9 is the one we formerly had most doubt 
about classifying. The reason can be seen from the analy- 
sis we have already made. It is highly saturated with the 
general factor, and only very weakly with the verbal 
factor which decides its bifactor group. 

8. A less artificial ecample.—The above example was an 
artificial one, made so as to “come out ” exactly. Let us 
turn to a more realistic example where this ts not the case. 
The following correlations—decimal points are again 
omitted—are from an actual report, but to obviate some 
embarrassments in a didactic example I have made all the 
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rger than they actually were. Th 
are examinations in school subjects, 


coefficients rather la 
first seven “ tests ” 


* 


the next four are “ non-verbal ” tests with simple pieces 
and the last three are special tests supposed 
to be uncontaminated by any 


of apparatus, 


v, and k (the “ space ” facto 


1 Physies . 

2 Chemistry. 
8 Mathematics 
4 French 
5 Mech. Draw. 
6 Problems 
7 Reading . 
8 Koh’s Blocks 
9 Cube Constr. 
10 Form Board 
11 Passalong . 
12 g test 
13 v test s 
l4 k test 


] 


2 
76 


68 
62 
52 
26 
26 
43 
36 
29 
23 
38 
15 
13 


3 4 5 


r). 


group factor other than g 


82 68 64 40 28 44 10 16 21 45 11 


68 62 32 26 26 43 86 20 23 38 15 

68 47 48 21 37 23 13 20 43 19 
68 45 23 34 29 25—13 05 26 34 
47 45 36 17 53 55 38 21 36 07 
48 23 36 19 51 47 20 40 47 05 
21 34 17 19 09 07 02 17—07 38 
37 29 53 51 09 81 50 50 64 4 
23 25 55 47 07 81 12 53 53 87 
13—13 38 20 02 50 42 52 34 19 
20 05 21 40 17 50 53 32 32 32 
43 26 36 47—07 6 t 53 34 32 40 


19 34 07 05 38 43 87 19 82 40 
18 00 42 36 03 65 66 38 46 57 45 


When by the above method we sort these tests into clus- 
ters, using 0-40 as boundary line, we obtain the following 


diagram: 


In passing, we may 


what Raymond Cattell (1946) 


one which forms 


clusters. Here the pair 8 and 9 are never separated, 
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nuclear cluster. For bifactor analysis, however, we want 
non-overlapping clusters. 

9. A first attempt at grouping. Searching in this diagram 
for at least three non-overlapping contours, we find 
clusters A, F, and either C or D. Of the alternatives let 
us take D, and rewrite our table of correlations with these 
clusters separated. This leaves Tests 6 and 7 out of the 
picture, and further study of the diagram leads us also to 
omit 5, which is linked with both F and D through cluster B. 
Our table, and its calculations, then is as follows : 


112 3 8 9 10 11 12 13 14 


76 82 68 44 19 16 21 100 45 11 10 66 

76 68 62 43 36 29 23 1.31 38 15 13 -66 

| 43 19 18 80 

68 62 68 | 29 25—13 05 46 26 34 00 | 60 
i 


mo me 
2 
w 
2 
— 
S 
a 
82 
2 
bo 
o 
— 
E 
© 
os 


9| 19 36 23 25 103 81 42 53 53 37 66 |156 


11 | 21 23 20 05 | -69| 50 53 52 32 32 46 1-10 


14 10 13 18 00 41 65 66 38 46 |215| 57 45 


From this table, by Holzinger’s formula, we obtain the 
g loadings shown at the right of the next table. For 
example : 


0-45 X 0-91 
B = 15055, 1 = 388 


When, using these g loadings, we remove the parts of the 
correlations due to that factor, we get the following table 
of residues. For example : 


‘76 — 353 X 404 = 62. 
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Residues 

4 

r 2 1s 14 7 

ings 

1 62 69 60| 09 —08 02 02 14 —08 —07 | 353 
2 62 58 53 03 05 13 02 03 —06 —07 -404 
8| 69 58 59] 00 —06 —02 00 10 —01 00 375 
4 60 53 59 07 07 —22 —07 06 22 11 228 
8| 09 03 00 o7 05 12 —02| —21 —09 17 -984 
9| —08 05 —06 07| 05 12 12 —14 —04 28 +769 
10 02 18 —02 —22 12 12 32 00 —02 19 -388 
111 02 02 00 —07 —02 12 32 —14 04 20 528 
12 14 03 10 06 —21 14 00 —14 —06 15 867 
13| —08 —06 —01 22 —09 —04 —02 04 —06 19 +529 
14 —07 —07 00 —11 17 28 19 20 15 19 488 


On examining these residues, however, we see that this 
time our hypothesis, that the clusters are exclusive with 
regard to their second group factors, is not justified. True, 
many of the residues in the side squares are very small. 
But two facts strike the eye: Test 14 (the k or space 
factor test) has quite large residues with the middle or non- 
verbal group, and Tests 10 and 11 (Form Board and 
Passalong) have a much larger residue than the other 
tests in the middle square. These facts suggest further 
purging the battery of 14 and either 10 or 11. It is very 


Residues 

1 8 4 8 1 13 
1 57 64 52 08 —08 02 10 ESTI 
2 57 48 45 05 07 03 00 —09 
3 64 48 52 00 —05 01] - 07 —04 
4 52 45 52 02 02 —11 —05 13 
8 08 05 00 —02 28 13| —06 —01 
9| —08 07 —05 02 28 25 00 04 
11 02 03 01 11 13 25 —04 09 
12 10 00 07 —05 —06 00 —04 13 
13 —11 —09 —04 15 —01 04 09 13 


8 
Load- 
ings 
424 
455 
436 
368 


842 
633 
437 


835 
522 
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frequently necessary to “ purge” a battery before the 
proper loadings of the remaining tests can be ascertained. 
10. The purged battery—When we do this (the reader 
should rewrite the tables and carry out the work), we get 
the loadings and residues shown at the foot of page 30. 
This table is much more like our artificial model. None 
of the correlation coefficients in the side squares are far 
from zero—we shall learn later how to decide whether they 
are, in fact, small enough to be ignored. Meanwhile, let us 
assume this, and suppose, that is to say, that these three 
groups of tests really are exclusive of one another in their 
second group factors. Their loadings in these we could 
then proceed to calculate. This is easily done in the middle 
group, where there are exactly three tests. We have : 


28 x -18 

m= 2 1486, n = “882 
28 X 25 

m 14 5384, m. = “784 
25 X 13 

mi =g 1161, my 841 


The equations of these three tests are therefore: 


26 8429 + 382% + 388 Sg 
2 6339 + 734 + 246 So 
zu 437g + 841% + -882 sy 


where the group factor common to them is given the non- 
committal name h. The coefficients of the specifics are 
settled by the fact that the sum of the squares of the co- 
efficients of such an equation (since the factors are inde- 
pendent) must equal unity. It will be noticed that Test 11 
(Passalong) has here a large specific. It probably shares a 
good deal of this with Test 10 (Form Board) which we 
excluded from the battery meanwhile for this very reason.* 
We cannot similarly calculate the group factor loadings of 
the third group of tests, for there are only two of them and 


* It should be repeated at this point that this example is purely 
illustrative, and no conclusions about actual tests may be drawn 
from this or from any of our examples. This is a book about 
factorial methods, not results. 
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three tests are necessary. We only know that the product 
of their two group factor loadings is 13. This emphasizes 
the necessity, in planning a bifactor battery, to have a 
sufficient number of tests. There must be at least three 
groups, and at least three tests in each group. 

The first group has four tests, and our first step should be 
to see whether its tetrad-differences are zero. If they were 
exactly zero, it would be immaterial which three of the 
four tests we chose to calculate loadings from. Here the 
tetrad-differences, though small (-0084, -0884, -0468), are 
not exactly zero. We shall defer to the next chapter 
(page 43) the question of how to make the best estimate 
of the loadings under these circumstances, but the reader 
might care to caleulate them from every possible three of 
the four tests and average the results. Our illustration has 
served its purpose of bringing to light difficulties which do 
not exist in an artificial example made to avoid raising 
them. 


2 . Sampling error. The general idea underlying the 
n 


xj 
X 


CHAPTER III 


SAMPLING ERROR AND THE THEORY OF TWO 
FACTORS 


otion of a sampling error is not a difficult one. Take, for 
example, the average height of all living Englishmen who 
are of fullage. This could, if need be, be ascertained by the 
process of measuring every living Englishman of full age. 
Actually this has never been done, and when anyone makes 
a statement such as “ The average height of Englishmen is 
674 inches,” he is basing it upon a sample only. This 
sample may not be an unbiased one. Indeed, samples of 
Englishmen whose height has been officially recorded are 
heavily loaded with certain classes of Englishmen—for 
example, prisoners in gaol, and unemployed young men 
joining the army of preconscription days. The average 
height of such men may well differ from that of all English- 
men. But when we speak of sampling error, we do not 
mean error due to the sample being known to be a biased 
one. Even if the sample of Englishmen used to find the 
average height of their race were, as far as could be seen, a 
perfectly fair sample, containing the proper proportion of 
all classes of the community and of all adult ages, etc., it 
yet would not necessarily yield an average exactly equal 
to that of all Englishmen. Several apparent replicas of the 
sample would yield different averages. It is these differ- 
ences, between statistics gathered from different but 
equally good samples, that we mean by sampling errors. 
It is worth while calling attention at this point to a 
general fact which will be found of importance at a later 
stage of this book. (‘The true average height of Englishmen 
is only so by definition, and does not in principle differ 
from the average of a sample. We had to define the popu- 
lation we had in mind as “all living Englishmen of full 
age.” This is a perfectly well-marked body of men. But 
F. A.—2 33 
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` Heel in its turn only a sample: a sample of all living 
or all living men. It is, indeed, altering daily 
as men die or reach the age of 21, and cack 
is a sample of those that have been and may he) 

who reach the age of 21 are only some, and therefore 
only a sample, of thae bom. And even those bom are 


thi 


saly a of iho who might have been born hed 
limes been better or had there heen no war, or a tax on 
Wette (Ho the idea of sampling & a relative onr, and 
the“ popalstion ™ from which we take sarm pios 
i a matier of definition only, The mathematical prvhirm 
in connexion with sampling whieh it i desirable to solve 
Mf ponaihie for cach statistio „ to find the complete low of 
ite ditribution when it j derived from cach of a large 


mumier of samples of a given sive. Mathematically this 
often very diffienlt, sad frequently we have to be 
daii with « emul which gives its bete 
variance if certain eee are allowed and certain 
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it is in Figure 9, which is the distribution of single 
measurements. 

If a sample were made with some special end in view, 
such as ascertaining whether red-headed men tend to be 
tall, we would decide whether we had detected such a 
tendency by calculating the probability that a mean such 
as our red-headed sample showed, or a mean still farther 
away from M, would occur at random. For this purpose 
we would compare the deviation of our sample from M 
with the standard deviation of the distribution of such ] 
samples, obtained by dividing the standard deviation of 
individuals by the square root of P, the number in the 
sample. The ratio of the deviation found, to the standard 
deviation, is the criterion, and the larger it is the more 
likely is it that red-headed men really do tend to be tall. 
For many practical purposes we take a deviation of over 

ice the standard deviation as “ significant.“ 

Sometimes the reader will find significance questions 
discussed in terms of the “ probable error ” instead of the 
standard deviation. The probable error is best considered 
as a conventional reduction of the standard deviation (or 
standard error, as it is sometimes called) to two-thirds of 
its value (more exactly, to -67449 of its value), 

Not only would the average height, or the average weight, 
of the sample of red-headed men differ from sample to 
sample. Statistics calculated in more complex ways from 
the measurements will also vary from sample to sample, 
as, for example, the variance of height, or the variance of 
weight, or the correlation of height and weight. Let us 
consider first the variance of the heights. In the whole 
population this is calculated by finding the mean, expres- 
sing every height as a plus or minus deviation from the 
mean, squaring all these deviations, and dividing the sum 
by the number in the population. z 

This is also how we would find the variance of the sample 
if we really want the variance of the sample. But if we 
want an estimate of the variance in the whole population, 
and the sample is small, it is better to divide by one less 
than the number in the sample. A glimpse of the reason 


for this can be got by considering the case of the smallest 
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possible sample, namely, one man. Here the mean of the 
sample is the one height that we have measured, and the 
deviation of that measurement from the mean of the sample 
is zero. The formula if we divide by the number in the 
sample (one) will give zero for the variance—and that is 
correct for the sample. But it would be too bold to estimate 
the variance of the whole population from one measurement: 
if we divide by one less than the sample we get variance 
= 0/0, that is, we don’t know, which is a wiser state- 
ment.* : 

More generally we can begin to understand the reason 
for dividing by (p — 1) instead of by p by the following 
considerations. 

The quantity we want to estimate is the mean square 
deviation of the measurements of the whole population, 
the deviations being taken from the mean of that whole 
population. We do not, however, know that true mean, 
and therefore in a sample we are reduced to using the mean 
of the sample, which except by a miracle will not exactly 
coincide with the true or population mean. The conse- 
quence is that the sum of the squares we obtain is smaller 
than it would have been had we known and used the true 
mean. For it is a property of a mean that the sum of the 
squares of deviations from it is smaller than of deviations 
from any other point. 


It is important to remember that sampling the population is not 
the only source of error in the measurement of statistics, e.g. the 
correlation coefficient. All sorts of influences may disturb it. These 
will ‘usually “ attenuate ” the correlation coefficient, i.e. tend to 
bring it nearer to zero, as can be seen when we consider that a perfect 
correlation only can be reduced by error. But they will not always 
do so, and if the errors in the two trait measurements are themselves 
correlated, they may even increase the true correlations in a majority 
of cases. An estimate of the amount of variable error present can 
be made from the correlation of two measurements of the same 
trait on the same group, a correlation called the “ reliability,” which 
should be perfect if no variable errors are present. Spearman’s cor- 
rection for attenuation (see Brown and Thomson, 1925, 156) is based 
upon this. Like all estimates, the correction for attenuation is correct, 
even if the errors are uncorrelated, only on the average and not in 
each instance, and it should never be used unless it is small. If it 
is large, the experiments are „ unreliable ” and should be improved. 
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Consider for example the numbers 2, 3, and 7. Their 
mean is 4, and the sum of the squares about 4 is— 


(—2))+(—1)'+ 814 


About any other point this sum will be greater than 14. 
About 5, for example, the sum is— 


17 
About 2 the sum is 
0? + 12 + 52 = 26 


It follows that the sum of the squares we obtained by 
using the sample mean was as small as possible, and in the 
immense majority of cases smaller than the sum about the 
true mean. It is to compensate for this that we divide 
by (p — 1) instead of by p.) 

These elementary considerations do not of course indi- 
cate just why this procedure should, in the long run, ex- 
actly compensate for using the sample mean. Why not 
(p — 2), one might say, or (p — 3) 2 It is not possible, in 
an elementary account like the present, to answer this. 
Geometrical considerations, however, throw some further 
light on the problem. The p measurements of the sample 
may be thought of as existing in a certain space of (p — 1) 
dimensions. For example, two points define a line (of one 
dimension), three points define a plane (of two dimensions), 
and so on. The true mean of the whole population is not 
likely to be within that space, whereas the mean of the 
sample is. The deviations we have actually squared and 
summed are therefore in a space of one dimension less than 
the space containing the true mean. One “ degree of free- 
dom ” has been lost by the fact that we have forced the 
lines we are squaring to exist in a space 6f (p — 1) di- 
mensions instead of permitting them to project into a 
p-space. Hence the division by (p — 1) instead of p. 

This principle goes farther. For each statistic which we 
calculate from the sample itself and use in our subsequent 
calculations, we lose a “ degree of freedom.” 

| The standard error of a variance v, if the parent popula- 
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tion from which the samples are drawn is normally distri- 
buted, is estimated as— 
: v 


V — 1) 
where p is the number of persons in the sample. The 
standard error of a correlation coefficient 7 is, with the 
same condition, estimated as— 


The use of this standard error, however, should be dis- 
continued (unless the sample is large and r small). 

Fisher (1925, page 202) has pointed out that the use of the 
formula for the standard error of a correlation coefficient 
is valid only when the number in the sample is large and 
when the true value of the correlation does not approach 
+1. For in small samples the distribution of r is not 
normal, and even in large samples it is far from normal 
for high correlations. The distribution of r for samples 
from a population where the correlation is zero differs 
markedly from that where the correlation is, say, 0-8. 
This means that the use of a standard error for testing 
the significance of correlation coefficients should, except 
under the above conditions, be discouraged. 

To get over the difficulty Fisher transforms 7 into a new 


variable z given by— 
z = Hlog(1 + 7) — lost — 1 
=r Hin tart... 
It is not, however, necessary to use this formula, as com- 
plete tables have been published for converting values 
of r into the corresponding values ofz. Asr goes from — 1 
to + 1, z goes from — 00 to + œ, and r =0 corresponds 


toz = 0. 

The great advantage of using z as a variable instead of r 
is that the form of the distribution of z depends very little 
upon the value of the correlation in the population from 
which samples are drawn. Though not strictly normal, it 


tends tọ normality rapidly as the size of the sample is 
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increased, and even for small samples the assumption 
of normality is adequate for all practical purposes. The 
standard deviation of z may in all cases be taken to be 
1/Vp — 3, where p is the number of persons in the sample. 

8. Error of a single tetrad. diſference. For our discussion 
of the influence of sampling on the factorial analysis of 
tests one of the most important quantities to know is the 
standard error of the tetrad-difference. There has been 
much debate concerning the proper formula for this. (See 
Spearman and Holzinger, 1924, 1925, 1929; Pearson and 
Moul, 1927 ; Wishart, 1928; Pearson, Jeffery, and Elder- 
ton, 1929; Spearman, 1931.) (That generally employed is 
formula (16) in the Appendix to Spearman’s The Abilities 
of Man: 

Standard error of 718721 — fatua = 


2 [Spearman and 
yun — 71 — T34 + 7?) + (1 — 277025 Holzinger’s 
formula (16).] 
where N is the number of persons in the sample,* 


r is the mean of the four correlation coefficients, and 
$ is their mean squared deviation (variance) from v. 


The probable error is -6745 times the above, A worked 
example will be found on page xii of Spearman’s Appendix, 
using (which is all one can do) the observed values of the 28. 

It will be remembered that in Section 7 of Chapter I 
we stated Spearman’s discoyery in the form “ tetrad- 
differences tend to be zero.“ If tetrad-differences in the 
whole population, however, were all actually zero, they 
would not remain exactly zero in samples, and it is only 
samples that are available to us. We are faced, therefore, 
with a two-fold problem. (a) We have to decide, from the 
size of the tetrad-differences actually found in our sample, 
whether the sample is compatible with the theory that the 
tetrad-differences are zero in the whole population. But 

80 we should also go on to consider whether the sample is 
equally compatible with the opposed hypothesis that the 

* We use p to mean the number of persons in this book, but are 


retaining N here and in “ formula 16a” below to preserve the usual 
appearance of these well-known and much-used expressions. 
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tetrad-differences are not zero in the whole population, 
leaving a verdict of “ not proven.”) (See Emmett, 1936.) 

4. Distribution of a group of tetrad-differences.—The 
actual calculation, for every separate tetrad-difference, of 
its standard error by Spearman and Holzinger’s formula 
(16) is, however, an almost impossibly laborious task. In 
a table of correlations formed from n tests there are 
n(n — 1)/2 correlation coefficients, and n(n — 1)(n — 2) 
(n — 8)/8 different (though not independent) tetrad- 
differences. Any one particular correlation-coefficient is 
concerned in (n — 2)(n — 8) different tetrad-differences, 
and any one test in (n —1)(n — 2)(n — 8)/2 different 
tetrad-differences. Thus with ten tests there are 630 
tetrad-differences, and with twenty tests 14,535 tetrad- 
differences. In the latter case, any one test is concerned 
in 2,907. Under these circumstances, it is natural to look 
for a more wholesale method than that of calculating the 
standard error of each tetrad-difference. The method 
adopted by Spearman is to form a table of the distribution 
of the tetrad-differences, and compare this distribution 
with that of a normal curve centred at zero and with 
standard deviation given by— 

2 [Spearman and Hol- 

yur Sry (1 = Ry} zinger’s formula (16a). ] 


where N = number of persons in the sample, 
r — the mean of all the 7’s in the whole table, 
s? = their mean squared deviation from r. 


n = number of tests. 

Numerous examples of the comparison of “histograms ” 
of tetrad-differences with normal curves whose standard 
deviation is found by (16a) are given in Spearman’s T'he 
Abilities of Man. This method of establishing the hypo- 
thesis, that the tetrad-differences are derived by sampling 
from a population in which they are really zero, is open to 
the same doubt as was explained in the simpler case of 
one tetrad-difference. The comparison can prove that 


1. A.—2.7 
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the tetrad-differences observed are compatible with that 
Vhypothesis. It does not in itself prove that they are 
compatible with that hypothesis only; and, as Emmett 
has shown in the article already mentioned, the odds are 
commonly rather against this. 
The usual practice, moreover, is to “ purify ” the battery 
Sot tests until the actual distribution of tetrad-differences 
agrees with (16a), so that in effect all that is then proved 
is that a team can be arrived at which can be described in 
terms of two factors. This, although a more modest 
claim than has often been made, and certainly less than 
is implicitly understood by the average reader, is never- 
theless a matter of some importance. Not all teams of 
tests can be explained by one common factor; but it is 
hot very difficult to find teams which can. There is little 
doubt in the minds of most workers that a tendency towards 
hierarchical order actually exists among mental tests. 

Q X 5. Spearman’s saturation formula. It will be remem- 
bered from Section 4 of Chapter I that the calculation of 
the g saturation of each test forms an important part of 
the Spearman process. We saw there that{in a hierarchical 
matrix each correlation is the product of the two g satura- 
tions of the tests, for example— 

"34 = May + Tay 
Since this is. so, each & saturation can be calculated 
from the correlations of a test with two others, and their 
inter-correlation. Thus to find i we can take Tests 2 and 
3 as reference tests, when we have 


7122 719727 + Tigfa Da 
= 0 mes! 
728 Tog + Tag 


N When the matrix is really hierarchical, and there are 
no sampling errors present, it is immaterial which two tests 
| We associate with Test 1 in order to find its g saturation. 


We have, in fact, in that case— 


5 Saa etis. O Pia -iTi ENE te 
723 745 7 — z 


But even if the correlations, measured in the whole 
population, were really exactly hierarchical, sampling 
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errors would make these fractions differ somewhat from 
one another, and we are faced with the problem of deciding 
which value to accept for the g saturation. The average 
of all possible fractions like the above would be one very 
plausible quantity to take but is laborious to compute. 
Spearman therefore adopts a fraction— 
Tia . fig + Tig + 718 F Tie» Tig + ete. | an 

Tas 7s Te . ele. pie 
whose numerator is the sum of the numerators, and whose 
denominator is the sum of the denominators, of the single 
fractions. This combined fraction he computes in a 
tabular manner which we will next describe, by the 
algebraically equivalent formula— 

A,? — Ay’ [Spearman’s formula (21), 

„ =p 94, Appendix, Abilities of Man.] 


The quantities 4,, A2, ete., are the sums of the rows (or 
columns) of the matrix of correlations without any entries 
in the diagonal cells. (The arithmetical example is con- 
fined to five tests to economize space) : 


1 2 3 4 5 At 
IEE erro jl aye | en one 
2 50 F 56 32 15 | 1:58 2-841 
3 34 -56 > 13 85 138 1-904 
4 33 82 “I8 : 20 | LOT 1145 
5 24 15. 35 29 | 103 1061 

ae T = 6-42 


T is the sum of all the 4’s, and therefore of all the 
correlations in the table (where each occurs twice). A 
new table is now written out, with each coefficient squared, 
and its rows summed to obtain the quantities 4’ : 


| 1 2 3 4 8 TA’ 
1 250 116 109 -058 588 

2 250 314 102 -028 | 689 . 
3 | 116 314 017 428 570 

4 | 109 4102 „017 084 312 

5 | 


058 -023 123 -084 fe 288. 
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The calculation of all the saturations is then best per- 
formed in a tabular manner, thus : 


| Í A 
A? A’ | 4 — A'|- 2A 124 7 Satu 

ration 

1 | 1-988 | 533 | 1-455 | 2:82 360 4042 66 ? 
„2; | 2:841 689 | 1652 3:06 | 336 | -4917 70 
3 | 1-904 | -570 1-334 | 276 | 3-66 8645 60 
4 | 1145 312 | 883 | 214 | 428 | 1946 44 
5 1-061 288 | 773 | 206 | 4°36 | 1778 42 


Where the last column is the square root of the preceding.) 
The reader should calculate the six different values of 
fi from the original table by the formula (ty ii), 
for comparison with the value -66 obtained above. He 
will ind 
55 72 89 
93 48 


f 52 
with an average of - 68. 


A 6{ Residues.—If the correlations which would arise from 

these saturations or loadings are calculated, and subtracted 
from the observed correlations, we obtain the residues 
which have then to be examined to see if they are small 
enough to be attributable to sampling error. In the 
following double table of correlations are set out the ob- 
served correlations uppermost, and those calculated from 
the g saturations below. The difference is the residue, 
which may be plus or minus : 


g Loadings | -66 70 60 4 42 
66 50 "84A 33 24 
46 40 29 28 
70 50 . 56 32 15 
4 42 31 29 
60 “BA 56 k 138 35 
40 42 26 25 
' “44 83 32 13 1 29 
29 31 26 18 

42 24 15 35 29 0 

= 28 29 25 18 
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The lower numbers are the products of the two 
saturations. In this case the residues range from — 14 
to -14 and at first sight appear in many cases to be 
too large to be neglected in comparison with the original 
correlations. 

To check this impression, consider the correlation +56 
and the value -42 from which it is supposed to depart only 
by sampling error, a deviation of -14. Fisher’s z corres- 
ponding to r = 42 is -45, and that corresponding to r = 
-56 is z — ‘68, so that the z deviation is -18. The standard 
deviation of z for 50 cases is 1 + »/47 = +15. The devia- 
tion is little larger than one standard deviation and cannot 
therefore be called significant. But as the reader will ob- 
serve, this conclusion is (due more to the large size of the 
standard error than to the small size of the residue.) The 
residue is here atlributable to sampling error, because the 
latter is so large. But because the latter is large it does not 
follow that the large residue is certainly due to it. 

7. Reference values for detecting specific correlation. If 
after a calculation like that described, one of the residues 
is found to be too large to be explicable by sampling error, 
the excess of correlation over that due to g is attributed to 
“ specific correlation,” meaning correlation due to a part 
of their specific factors being not really unique but shared 
by these two tests. In the case of our numerical example, 
if the number of subjects tested had been larger, the standard 
errors of the coefficients would have been smaller, and some 
of the discrepancies between the experimental values and 
those calculated from the g saturations would have been 
too large to be overlooked, but would have had to be 
attributed to specific correlation. In such a case, the g 
loadings would, of course, be wrong and would have to be 
recalculated from the battery after one of the tests con- 
cerned in the specifie correlation was removed from it. 
Later, the other test could be replaced in the battery 
instead of the first, and thus éés g saturation found. The 
difference between the experimental correlation of the 
two, and the product of their g saturations, with a standard 
error dependent on the size of the sample, would be then 
attributed to their specific linkage. 
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If two tests, v and w, are thus suspected of having a 
specific link as well as that due to g, it is clear that the 
smallest battery of tests which could be used in the above 
manner to detect that link would be one of two other tests, 
æ and y, say, to make up a tetrad : 


v v 


w Tow Ny w 
y Tey Vay 


and these two “ reference“ tests would have to be known 
to have no specific links with each other or with the two 
suspected tests. The example which gave rise to Figure 5 
(see Chapter I, page 15) illustrates this. Tests 2 and 8 
there are, let us suppose, those with a suspected specific 
link. The tetrad-difference to be examined by means of 
Spearman’s formula (16) is that which has r, as one corner. 
In such a case, where the two reference tests 1 and 4 are 
known to have no link except g with one another, or with 
the other two tests, two of the possible tetrad-differences 
ought to be larger than three times the standard error 
given by formula (16), and equal to one another, while the 
third tetrad-difference should be zero (or sufficiently near 
to zero, in practice) (Kelley, 1928, 67). 

The g saturation of each of the tests under examination 
for specific correlation can be found by grouping it with 
the two reference tests. Thus in the case of our Figure 5, 
we have— 


fa? = Se 

* 710 5 , 

PR ats Sl a 
710 5 


Therefore the correlation between 2 and 8 which is due 
to g is— 


Tu Tay = V5 X V3 = 5 
and the difference between this and -8, the actual value, 


is the part to be explai the speci b 
— explained by specific factor shared by 


E 
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When there are several reference tests available, all 
believed to have no link except g with one another or with 
the two tests suspected of specific overlap, there will be 
a number of ways of picking two of them to obtain the 
tetrad required to decide the matter, and the results will, 
because of sampling and other errors, be discrepant, Under 
these circumstances Spearman has devised an interesting 
procedure for amalgamating the results into one. A 
numerical example is given by him on page xxii of the 
Appendix to The Abilities of Man. 


CHAPTER IV 
Q THE DEFINITION OF g 


1./Any three tests define ag. The idea of g arose out of 
Professor Spearman’s acute observation that correlation 
coefficients between tests tend to show hierarchical order E 
that is, that their tetrad-differences tend to be zero or small; 
or in more technical terms still, that the rank to which a 
matrix of correlation coefficients can be “ reduced” by 
suitable diagonal elements tends towards rank one. This 
fundamental fact is at the basis of all those methods of 
factorial analysis which magnify specific factors. In con- 
sequence, correlation coefficients between a number of vari- 
ables can be adequately accounted for by a few common 
factors. To be adequately described by one only—a g— 
the reduced rank of the correlation matrix has to be 
one, within the limits of sampling error. 

Suppose now that we have three tests and have, in the 
whole population, measured their correlation coefficients. 
If, as is usually the case, these coefficients are all positive, 
and if each of them is at least as large as the product of the 
other two, we can explain them by assuming one g and 
three specifies si, Sẹ» and $3.) There are many other ways 
of explaining them, but let’ us adopt this one. We have 
thereby defined a factor g mathematically (Thomson, 1935a, 
260). It is then for the psychologist to say, from a 
consideration of the three tests which define it, what name 
this factor shall bear and what its psychological description 
is. The psychologist may think, after studying the tests, 
that they do not seem to him to have anything in common, 
or anything worth naming and treating as a factor. That 
is for him to say. Let us suppose that at any rate he does 
not reject the possibility, but that he would like an oppor- 
tunity of studying other tests which (mathematically 
speaking) contain this factor, and have nothing else in 
common, before finally deciding. 

48 
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In that case the experimenter must search for a fourth 
test which, when added to these three, gives tetrad- 
differences which are zero; and then for a fifth and further 
tests, each of which makes zero tetrad-differences with the 
tests of the pre-existing battery. This extended battery- 
the experimenter would lay before the psychological judge, 
to obtain a ruling whether the single common factor, of 
which it is the now extended but otherwise unaltered 
definition, is worthy of being named as a psychological 

factor. 

2. The extended or purified hierarchical battery. - Mathe- 
matically, any three tests with which the experimenter 
cared to begin would define “ a ” g, if we except temporarily 
the case, to which we shall later return, of three correlation 
coefficients, one of which is less than the product of the 
other two. (The experimental tester, however, might in 
some cases have great difficulty in finding further tests, to 
add to the original three, which would give zero tetrad- 
differences. Unless he could do so, it is unlikely that the 
psychological judge would accept the factor as worthy of 
a name and separate existence in his thoughts. It is, for 
example, an experimental fact that starting with three 
tests which a general consensus of psychological opinion 
would admit to have only intelligence“ as a common 
requirement, it has proved possible to extend the battery 
to comprise about a score of tests without giving any 
tetrad-differences which cannot be regarded as zero. 
Even that has not been accomplished without difliculty, 
and without certain blemishes in the hierarchy having to be 
removed by mathematical treatment. But the fact that 
with these reservations it is possible, and that psychological 
judgment endorses the opinion that each test of this battery 
requires “ intelligence,” is the main evidence behind the 
actual “ existence ” of such a factor as “ g, general intelli- 


* The process of making such a battery of tests to define general 
intelligence (see Brown and Stephenson, 1933) has not in fact taken 
the form of choosing three tests as the basal definition and then 
extending the battery. Instead, a number of tests which, it was 
thought from previous experience, would act in the desired way have 
been taken, and the battery thus formed has then been purified by 
the removal of any tests which broke the hierarchy. 
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gence.” It must be noted that the word “ existence ” 
here does not mean that any physical entity exists which 
can be identified with this g. It does mean, however, that, 
as far as the experimental evidence goes, there is some 
aspect of the causal background which acts “as if” it 
were a single unitary factor in these tests. 

The important point to note is that the experimenter has 
produced a battery of tests which is, he claims, hierarchical; 
that the mathematician assures him that such a battery 
acts “as if” it had only one factor in common (though it 
can also be explained in many other ways), and that the 
psychologist.agrees that psychologically the existence of 
such a factor as the sole link in this battery seems a reason- 
able hypothesis. 

3. Different hierarchies with two tests in common.—Now, 
it must be remembered that, starting with three other 
tests, which may contain two of the former set, it may 
very well be possible to build up a different hierarchy. 
Only experiment could show whether this were possible in 
each case, there is no mathematical difficulty in the way. 
Such a hierarchy would also define “ a g, but this would 
be usually a different factor from the former g. If there 
were three tests common to the two hierarchies, then the 
two gs could be identified with one another (sampling 
errors apart), and the three tests would be found to have 
the same saturations with the one g as with the other. But 
if only two tests were common to the two batteries this 
would not in general be the case, and the different satura- 
tions of these tests with the two g’s would show that the 
latter were different (Thomson, 1935a, 261-2). Under 
such circumstances the psychologist has to choose. He 
cannot have both these g’s. Both are mathematically of 
equal standing, it is a psychological decision which has to 
be made. When one g is accepted, the other, as a factor, 
must then be rejected and a more complicated factorial 
analysis of the second hierarchy has to be built up which 
is consistent with this. 

4. A test measuring “ pure g. — Although the hierarchical 


battery defines a g, it does not enable it to be measured 


exactly (but only to be estimated) unless either it contains 
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an infinite number of tests, or a test.can be found which 
conforms to the hierarchy and has a g saturation of unity.* 
In the latter case this test which is “pure g is such that 
when it is considered along with any other two tests of its 
hierarchy, its correlations with them, multiplied together, 
give the intercorrelation of those two with one another : 
if k is the “ pure test, then 


UV irs ee 


Ue = Tü 
its g saturation being— 


F, Li _ 
r, 


ij 

No such “ pure” test of the g which is defined by the 
Brown-Stephenson hierarchy of nineteen tests has yet been 
found. Such a pure test, with full g saturation, must not 
be confused with tests which are sometimes called tests of 
pure g because they do not contain certain other factors, 
in particular the verbal factor. Thus the “i Saves 
(Spearman Visual Perception) tests are referred to by 
Dr. Alexander (1935, 48) as a “pure measure of g”; but 
their saturations with g are given by him (page 107) as 
757, -701, and -736 respectively, so that in each case only 
about half the variance is “ g ” and half is a specific. 

5. The Heywood case.—Consider the case where three 
tests are such that— 

atin > fä a 


In such a case the g saturation of the test , if we calcu- 
late it, is greater than unity, which is impossible. Yet it 
is possible, in theory at least, to add tests to such a triplet 
to form an extended hierarchy with zero tetrad-differences. 
There can be one such case (but only one) in a hierarchy. 
We shall call them Heywood cases, as this possibility was 
first pointed out by him (Heywood, 1981). As an artificial 
example, consider these correlations : 


* It is understood, of course, that even such a test would give 
different measures of a man’s g from day to day, if the man’s per- 
formance in it varied (as it undoubtedly would) from day to day. 
By measuring with exactness is meant, in this part of the text, 
measurement free from the uncertainty due to the factors out- 
numbering the tests. We are assuming sampling errors to be nil. 


— 


\ 


\ 
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case of pure g will leave one of the rows of the above sum 
non-zero. To make the whole sum zero, one case must be 
a Heywood case, giving— 


1 — r° negative. 


It would seem, therefore, that by the time we have 
added hierarchical tests to make them equal in number to 
the persons, we will necessarily have added a Heywood 
hierarchical case (of which there can be only one in a 
hierarchy). But we have agreed that the discovery of a 
Heywood case will cause us to abandon the hierarchy as 
a definition of g ! 

The case where the number of tests is increased to equal 
the number of persons may seem to the reader to be an 
academic case only. But the case of reducing the number 
of persons until they equal the number of tests is one which 
could easily be realized in practice, and presents equal 
theoretical difficulties. This draws attention to the 
dependence of any definition of factors on the sample of 
persons tested. If we have a perfect hierarchy of, say, 
50 tests, in a population of, say, 1,000 persons, a sample of 
fifty persons from the above thousand, if it gives hier- 
archical order, will give a Heywood case, and its g will be 
impossible. 

If the g corresponding to the original analysis on the 
thousand persons were anything real, such as a given 
quantity of mental energy available in each person, then 
it ought always to be possible, one might erroneously 
think, to find fifty persons and fifty tests to give a hierarchy, 
without a Heywood case. But that cannot be easily said. 
It is impossible, from the correlations alone, to distinguish 
a real g from one imitated by a fortuitous coincidence of 
specifies. Even if g were a reality, a sample of persons 
equal in number to the tests could not give a hierarchy 
without a Heywood case, and their apparent g would be 
fortuitous. 

Now the case of a test of pure g is on the border line of 
the Heywood cases. It is clear then that it will be suspect, 
as being probably only fortuitous, if the number of persons 
does not far exceed the number of tests. 
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V7. Singly conforming tests. There remains one (other 
conceivable method of measuring g exactly,* by the use 
of certain tests which, when they are all present, destroy 
the hierarchy, although any one of them can enter the 
battery without marring it—“ singly conforming ” tests) 
(Thomson, 19340; and 19354, 258-6). It will be shown 
in later chapters on factor estimation that the_reason 
factors cannot be measured exactly, but have to be esti- 
mated only, is that they outnumber the tests. Every 
new test which conforms to a hierarchy adds a new specific 
(unless it is pure g), and thus continues the excess of factors 
over tests. It can occur, however, that the correlation of 
two tests with each other breaks a hierarchy, although 
either of them alone conforms otherwise. Such a case 
occurs in the Brown-Stephenson battery, for example, one 
of whose correlation coefficients has to be suppressed before 
the hierarchy is acceptable. 

(In such a ease, if the psychologist is prepared to accept 
either test as a member of the battery, the erring correlation 
coefficient must be due to these two tests sharing some 
portion of their specifies with one another) If, as may 
happen (apart from error which we are supposing absent), 
(their intercorrelation shows that they have only one specific 
factor between them, and differ only in their saturations, 
then they enable the estimate of g to be turned into accurate 
measurement.) For example, consider the following matrix 
of correlations : a 


1 2 3 4 5 6 
1 . 669 592 458 335 251 
2 669 ci 566 438 870 +240 
3 592 566 . 387 283 212 
4 458 438 387 n 164 
5 335 -870 283 219 5 120 
6 251 +240 212 164 120 


This is a perfect hierarchy except for the correlation— 
725 870 


» ig meant, with the same exactness as the test 


* By exactly s 
additional indeterminacy due to an excess of 


scores, without the 
factors over tests. 
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Every tetrad-difference, which does not contain this 
correlation, is zero. If either Test 2 or Test 5 is removed 
from the battery, there remains a perfect hierarchy. If 
Test 5 is removed, we can calculate from the remaining 
battery the g saturations : 


Test | 1 2 3 1 6 


g saturation | 887 -800 -707 -548 300 


If we remove Test 2 and restore Test 5, we get the fol- 
lowing : 


Test 1 3 4 5 6 


g saturation | 837 707 548 40⁰ 30⁰ 


From either hierarchy we can estimate g. The correla- 
tion of our estimates with “ true g” will be— 


S 
S ＋ 1 
saturation? 


where Sy 8 
1 — saturation? 


and we find for the two hierarchies the g correlations of 
92 and -90. 
From the two Tests 2 and 5 alone, however, we can ob- 
tain a g correlation of unity. i 
The reason for this is that the correlation of Tests 2 
and 5 is such as to show that their specifics are identical, 
the two tests differing only in their loadings. Their 
equations are— 
2 = 8g EVAL — -8*)s, 
tg + VŮL — “4s, 
If the whole of s, is identical with the whole of Ss their 
intercorrelation should be— 


8 X 4+ V(1 — 8 “01 — 450 = -870 
and this is its experimental value, 


We could, therefore, have seen at the beginning, if we 
had tested the above fact, that these two tests would make 
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a perfect battery for measuring g. We have the simul- 
taneous equations— 

Za = 8g + 68 

zs = 4g + 9178 
from which we can eliminate s. 

We see, therefore, that (under certain hypothetical 
circumstances, a more exact estimate of g can be obtained 
from two of these “singly conforming” tests than the 
hierarchy with which they conform individually. Those 
circumstances are, that their correlation with one another 
(the correlation which breaks the hierarchy because it is 
too large) should either equal— 

Tighjg + V (1 = Tig?)(1 — Ty’) 

or should approach this value. 

_ It cannot in actual practice be expected to equal it, as 
in our artificial example. For we have disregarded errors, 
which are sure in some measure to be present. At what 
stage will the pair of singly conforming tests cease to be 
a better measure of g than the better of the two hierarchies 
made by deleting either the one or the other? If in our 
example the correlation -870 of Tests 2 and 5 be imagined 
to sink little by little, the correlation of their estimate 
with g will sink from unity. The better of the two hier- 
archies gives a multiple correlation of -922. When the 
correlation 7; has sunk from -870 to ‘847, these two singly 
conforming tests will give the same multiple correlation, 
922. If this defect from the full -870 is due entirely to 
error, then a fall to -847 corresponds to reliabilities of the 
two tests of the order of magnitude of -98, if they are 
equally reliable. This is a very high reliability, seldom 
attained, so that in a case like our example quite a small 
admixture of error would make the singly conforming 
tests no better at estimating g than the hierarchy. We 
are here, however, neglecting the fact that error would also 
diminish the efficiency of the hierarchy. Nevertheless, the 
chance of finding a pair of singly conforming tests, highly 
reliable, and having no specifies except that which they 
share, seems small, as small as the chance of finding a test 
of pure g, perhaps. It might possibly turn out, however, 


58 THE FACTORIAL ANALYSIS OF HUMAN ABILITY/ 


that a matrix of several (say t) singly conforming tests 
would be practicable. Such a set would measure g exactly 
if among them they added only ¢ — 1 new specifics to the 
hierarchy. Their saturations would be found by placing 
them one at a time in the hierarchy, and then their regres- 
sion on g calculated by Aitken’s method (see Chapter XIV). 
The necessity for the hierarchy in the background, in all 
this, is clear: it is there to assure us that each singly con- 
forming test is compatible with the definition of g, and to 
enable its g saturation to be calculated.) 

8. The danger of “ reifying ” factors.—The orthodox view 
of psychologists trained in the Spearman school is that g is, 
of all the factors of the mind, the most ubiquitous. “ All 
abilities involve more or less g,” Spearman said, although 
in some the other factors are“ so preponderant that, for 
most purposes, the g factor can be neglected.” With 
this view, the present author has always agreed, provided 
that g is interpreted as a mathematical entity only, and 
judgment is suspended as to whether it is anything more 
than that. 

The suggestion, however, that g is “ mental energy,” of 
which there is only a limited amount available, but avail- 
able in any direction, and that the other factors are the 
neural machines, is one to be considered with caution. 
The word energy has a definite physical meaning. Mental 
energy ” may convey the meaning that the energy spoken 
of is the same as physical energy, though devoted to mental 
uses. If that meaning is accepted, innumerable difficulties 
follow, not the least being the insoluble questions of the 
connexion of body and mind, and of freewill versus 
determinism. A less obscure difficulty is that there seems 
to be no easily conceivable way in which the “ energy“ 
of the whole brain can be used in any direction indifferently, 
except by the neural engines also all taking part. The 
energy of a neurone seems to reside in it, and the passage 
of a nerve impulse along a neurone seems to resemble 
rather the burning of a very rapid fuse, than the conduction 
of electricity, say, by a wire. 

VY If “mental energy ” does not mean physical energy at 
all, but is only a term coined by analogy to indicate that 
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the mental phenomena take place “ as if’ there were such 
a thing as mental energy, these objections largely disappear. 
Even in physical or biological science, the things which are 
discussed and which appear to have a very real existence 
to the scientist, such as “ energy,” “ electron,” “ neutron,” 
gene, are recognized by the really capable experimenter 
as being only manners of speech, easy ways of putting into 
comparatively concrete terms what are really very abstract 
ideas. With the bulk of those studying science there exists 
always the danger that this may be taken too literally, but 
this danger does not justify us in ceasing to use such terms. 
In the same way, if terms like “ mental energy prove to 
be useful, and can be kept in their proper place, they may 
be justified by their utility. The danger of “ reifying ” 
such terms, or such factors as g, b, etc., is, however, very 
great. 


— — 


CHAPTER V 
THE CENTROID METHOD 


1. Need of group factors— The two-factor method of 
analysis, described in an earlier chapter, began with the idea 
that a matrix of correlations would ordinarily show perfect 
hierarchical order if care was taken to avoid tests which 
were “ unduly similar, ) i.e. very similar indeed to one 
another. If such were found coexisting in the team of 
tests, the team had to be “ purified ” by the rejection of 
one or other of the two. Later it became clear that this 
process involves the experimenter in great difficulty, for it 
subjects him to the temptation to discover “ undue simi- 
larity ” between tests after he has found that their correla- 
tion breaks the hierarchy. Moreover, whole groups of 
tests were found to fail to conform ; and so group factors 
were admitted) though always, by the experimenter trained 
in that school, with reluctance and in as small a number as 
possible. It had, however, become quite clear that the 
Theory of Two Factors in its original form had been super- 
seded by a theory of many factors, although the method 
of two factors remained as an analytical device for 
indicating their presence and for isolating them in com- 
parative purity. 

Under these circumstances it is not surprising that some 
workers turned their attention to the possibility of a method 
of multiple-factor analysis, by which any matrix of test 
correlations could be analysed direct into its factors 
(Garnett, 19194 and b). It was Professor Thurstone of 
Chicago who saw that one solution to this problem could 
be reached by a generalization of Spearman’s idea of zero 
tetrad-differences. 

2. Rank of a matria and number of factors.—We saw that 
when all the tetrad differences are zero, the correlations 
can all be explained by one general factor, a tetrad being 

63 
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formed of the intercorrelations of two tests with two other 
tests, thus: 


| 3 4 
1 | Tig Tig 
2 23 724 


and the tetrad- difference being 


713/24 — 128714 
Thurstone's idea, though rather differently expressed by 
him, can be based on a second, third, fourth. . . calcu- 
lation of certain tetrad-differences of tetrad-differences. ) 
{ To explain this, let us consider the correlation ĉo- 
efficients which three tests make with three others : 


4 5 6 
1 Tig Tis 716 
2 724 T25 T 26 
3 T34 T35 T36 


This arrangement of nine correlation coefficients might 
have been called a nonad,“ by analogy with the tetrad. 
Actually, by mathematicians, it is called a “ minor deter- 
minant of order three” or more briefly a three-rowed 
minor ; a tetrad is in this nomenclature a “ minor of order 
two.” 

We can now, on the above three-rowed determinant, 
perform the following calculation. Choose the top left 
coefficient as “pivot,” and calculate the four tetrad- 
differences of which it forms part, namely : 


(Tus — Fears) ("1426 — 724716) 
("14735 — 1647 15) ("14736 — 734710) 

These four tetrad-differences now themselves form a 
tetrad which can be evaluated. If it is zero, we say that 
the three-rowed determinant with which we started 
“ vanishes.” 

Exactly the same repeated process can be carried on with 
larger minor determinants.) For example, the minor of 
order four here shown vanishes : 


THE CENTROID METHOD 65 


(-26) 32 38 “Bd 

42 36 62 72 

44. 62 66 46 

4⁵ 58 63 60 
for its pivotal (— 0408) 0016 0444 
t.d.’s are 0204 0044 — 0300 
0068 — 0072 0030 

and then ( 00021216) 00031824 


00028288 — 00042432 
and finally zero 


This process of continually calculating tetrads is called 
“ pivotal condensation.) The reader should be given a 


word of warning here, that the end-result of this form of 
calculation, if not zero, has to be divided by the product of 
certain powers of the pivots, to give the value of the deter- 
minant we began with.) A routine method (Aitken, 19874) 
of carrying out pivotal condensation, including division 
by the pivot at each step, is described in Chapter XIV, 
pages 201ff.* : 

(We can in this way examine the minors of orders two, 
three, four (and so on) of a correlation matrix, always 
ayoiding those diagonal cells whi spond to the 
correlation of a test with itself. We may come to a point 
at which all the minors of that order vanish. Suppose these 
minors which all vanish are the minors of order five. We 
then say that the “rank ” of the correlation matrix is four 
(with the exception of the diagonal cells), There then 
exists the possibility that the “ rank ” of the whole corre- 
lation matrix can be reduced to four by inserting suitable 
quantities in the diagonal cells (see next section). The 
rank ” of a matrix is the order of its largest} non-vanish- 


entirely composed of zeros, the rank of the original determinant is 
correspondingly less, bein ual to the number of condensations 


needed to give zeros. 
+ “Largest ” refers to the number of rows, not to the numerical 


value, 
F.A.—3 


* If the process gives, at an earlier stage than the end, a nat | 
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ing minor. The tests-ean then be analysed into as many 
common factors as the above reduced rank of their corre- 
lation matrix—the rank, that is to say, apart from the diag- 
onal cells—plus a specific in each test. 

3. Thurstone’s method used on a hierarchy.—Thurstone’s 
rule about the rank includes Spearman’s hierarchy as a 
special ease, for in a hierarchy the tetrads—that is, the 
minors of order two—vanish. The rank is therefore one, 
and a hierarchical set of tests can be analysed into one 
common factor plus a specific in each. A simple way of 
introducing the reader to Thurstone’s hypothesis and also 
to his “ centroid ” method* of finding a set of factor satura- 
tions will be to use it first of all on the perfect Spearman 
hierarchy Which we cited as an artificial example in our 
first chapter. 


Tests 1 2 3 4 5 6 
1 8 7 63 54 AS 36 
2 72 = 56 48 40 32 
3 63 56 3 42 35 28 
4 54. 48 42 F 30 24 
5 45 40 “BS 30 . "20 
6 36 32 28 24 20 5 


| The first step in Thurstone’s method, after the rank has 
been found, is to place in the blank diagonal cells numbers 
which will cause these cells also to partake of the same rank 
as the rest of the matrix, numbers which, for a reason which 
will become clear later, are called “ communalities.“ In 
our present Spearman example that rank is one, i.e. the 
tetrads vanish. The communalities, therefore, must be 
such numbers as will make also those tetrads vanish which 
include a diagonal cell: this enables them to be calculated. 
Let us, for example, fix our attention on the communality 
of the first test, which we will designate h,® (the reason for 
the “square” will become apparent later). Then the 
tetrad formed by Tests 1 and 2 with Tests 1 and 8 is: 


* We shall see why it is called the “ centroid ” method in the 
next chapter, 
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1 | h 63 
2 72 56 
and the tetrad-difference has to vanish. Therefore 
.56½1 — 72 x 68 =0 
81 

Similarly all the communalities can be calculated, and 

found to be ; 

81 64. 49 36 25 16 
(The observant reader will notice that they are the squares 
of the “ saturations ” of our first chapter; but let us con- 
tinue as though we had not noticed this.) 

The method of finding the saturations of each test with 
the first common factor is then to insert the communalities 
in the diagonal cells and add up the columns* of the 
matrix, thus: 


Original Correlation Matrix 


(81) 72 -63 -54 45 36 
72 (64) 56 48 40 32 
68 56 (49) 42 35 28 
54 48 42 (:36) 30 24 
45 40 35 30 (25) 20 
36 32 28 24 20 (-16) 


3-51 3-12 2-73 2-34 1:95 1:56 15-21 


The column totals are then themselves added together 
(15-21) and the square root taken (3-90). The “ satura- 
tions“ of the first (and here the only) common factor 
are then the columnar totals divided by this square root, 
namely— 

3-51 3:12 2-73 2-34 1-95 1-56 
3-90 3°90 3°90 3:90 3-90 3-90 
or 9 8 7 6 5 4 


* This, the “ centroid ” method of finding a set of loadings, is not in \\ 
any way bound up with Thurstone’s theorem about the rank and 
the number of common factors. It can be used, for example, with l 
unity in each diagonal cell, in which case it will give as many common Í 
factors as there are tests, and no specific factors. : 


bere! — 
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as in the present instance we already know them to be, 
(Very often in multiple-factor analysis the “ saturation ” 
of a test with a factor is called the *“ om i this is 
a convenient place to introduce the new term. 

As applied to the hierarchical case, this method of 
finding the saturations or loadings had been devised and 
employed many years previously by Cyril Burt, though it 
is not quite clear how he would have filled in the blank 
diagonal cells (Burt, 1917, 58, footnote, and 1940, 448, 462). 
It should be explained that in actual practice(Thurstone 
and his followers do not calculate the minor determinants 
to find the rank and the communality, for that would be 
too laborious. Instead they adopt various approximations, 
of which the simplest is to insert in each diagonal cell the 
largest correlation coefficient of the column) (sce Section 
10). 8 Sai 

4{ The secind stage of the * centroid” method.—If there is 
morë than one common factor, the process goes on to 
another stage. Even with our example we can show the 
beginning of this second stage, which consists in forming 
that matrix of correlations which the first factor alone 
would produce. This is done by writing the loadings 
along the two sides of a chequer board and filling every cell 
of the chequer board with the product of the loading of 
that row with the loading of that column, thus : 


First-factor Matriz 

| 2 8 7 46 5 4 

„ h w 2 

4j na w o w w 17 

I 6 02 e 8 28 

|) SS 8 2 2 D 21 

3 46 0 S D 243 20 

1 „ „ w» 10 
This is the “ first-factor matrix " which gives the parts of 
the correlations due to the first factor. This matrix has now 
to be subtracted from the original matrix to find the resi- 


dues which must be explained by further common factors. 
In our present example the first-factor matrix is identical 
with the original matrix and the residues are all zero. Only 


— 
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the one common factor is therefore required, (Of course, 
the reader will understand that in a real experimental 
matrix the residues can never be expected to be exactly 
vero: one is content when they are near enough to zero to 
be due to chance experimental error.) Had the rank of 
our original matrix of correlations been, ’ 

than one, there would have been a matrix of residues, 

Let us now make an artificial example with a 
number of common factors, say three, which we can after- 
wards use to illustrate the further stages of Thurstone’s 
method, We ean do this in an illuminating manner by 
the aid of the oval diagrams deseribed in Chapter J. 

5. A three-factor cramplen In Figure 10, a diagram of the 
overlapping variances of four tests, let us insert three 


to 10 (to make our arithmetical 
work easy), No factor here is 
common to all the four tests, The 
factor with a variance of 4 runs 
through Tests 1, 2, and B. That 
with a variance 8 runs through 
Tests 2, 3, and 4. That with a 


Moreover, we can put into our matrix the communalities 
ing to our diagram. Each co in 
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matrix above, with communalities inserted. We can now 
pretend that it is an experimental matrix, ready for the 
application of Thurstone’s method, as follows : 


(66) 4 4 2 
4 (7%) 7 3 Original 
4 7 (-7) £ experimental 
2 3 3 65) matrix. 

16 21 2-1 1:3 = 7-1 = 2:6646? 


lst Loadings | -6005 7881 7881 4879 = 2-6646* 


6005 (3606) 4733 4733 2930 

7881 4783 (6211) -6211 3845 Pirst-factor 
7881 4733 6211 (6211) -3845 matrix. 
4870 29030 3845 3843 (2380) 


Here it is seen that the loadings of the first factor, when 
cross-multiplied in a chequer board, give a first-factor 
matrix which is not identical with the original experimental 
matrix, unlike the case of the former, hierarchical, matrix. 
Here (as we who made the matrix know) one factor will 
not suffice. We subtract the first-factor matrix from the 
original experimental matrix to see how much of the 
correlations still has to be explained, and how much of the 
. “ communalities ° or communal variances. The latter 

were 

6 7 rf 5 


and of these amounts the first factor has explained— 
3606 6211 6211 2380 


If we subtract the first- factor matrix, element by element, 


from the original experimental matrix, we get the residual 
matrix: 


(2394) — 0733 — 0733 — -0930 
— -0733 (0789) 0789 — 0846 First residual 
— 0733 0789 (0789) — 0845 matrix. 

— 0930 — 0848 — -0845 (2620) 


* This check should always be applied. To avoid complication 
it is not printed in the later tables. It applies to the loadings with 
their temporary signs (see page 72). 
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To this matrix we are now going to apply exactly the same 
procedure as we applied to the original experimental 
matrix, in order to find the loadings of the second factor. 
But we meet at once with a difficulty. (The columns of the 
residual matrix add up exactly* to zero! This always 
happens, and is indeed a useful check on our arithmetical 
work up to this point, but it seems to stop our further 
progress. 

To get over this difficulty we change temporarily the : signs 
of some of the tests in order to make a majority of the cells 
of each column of the matrix positive. The best plan is to 
change the sign of the test with most minuses in its column 
and row, and so on until there is a large majority of plus 
signs. Copy the signs on a separate paper, omitting the 
diagonal signs, which never _change. Since some signs 
will change twice or thrice, use thé convention that a 
plus surrounded by a ring means minus, and if then 
covered by an X means plus again. Near the end, watch 
the actual numbers, for the minus signs in a column may 
be very small. The object is to make the grand total 
a maximum, and thus take out maximum variance with 
each factor. We shall here, however, for simplicity adopt 
an easier rule, i.e. tọ seek out the column whose_total 
regardless of signs is the largest, and l then temporarily change 
the signs of variables so as to make all the signs in that 
column positive. 5 

The sums of the residual columns, regardless of sign, are — 


4790 3156 3156 5240 


and therefore we must change the signs of tests so as to 
make all the signs in Column 4 positive; that is, we must 
change the signs of the first three tests. Since we change 
the three row signs, as well as the three column signs, this 
will leave a block of signs unchanged, but will make 
the last column and the last row all positive. We can 
then proceed as shown overleaf.  — zz) 


* When enough decimals have been retained. In practice there 


may be a discrepancy in the last decimal place. 
+ Changing the sign of Test 4 would here have the same result, 
but for uniformity of routine we stick to the letter of the rule. 
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2394 — 0733 — 0733 (—)-0930 
— 0733 0789 0789 (—)-0845 First residual 
— 0733 0789 0789 (—)-0845 matrix with 
(—):0930 (—)-0845 (—)-0845 2620 changed signs. 
1858 1690 1690 5240 = 1:0478 
= 1-0236? 
2nd 1815 1651 1651 5119 With temporary 
Loadings signs. 
1815 0329 0300 0300 0929 
1651 0300 0273 0273 0845 Second-factor 
1651 0300 0273 0273 0845 matrix. 
5119 0929 084 0845 +2620 
2065 — 1033 — -1033 0001 
— 1033 0516 0516 5 Second residual 
— 1033 0516 0516 4 matrix. 
0001 . 


On the matrix with these temporarily changed signs we 
then operate exactly as we did on the original experimental 
matrix, and obtain second-factor loadings which (with 
temporary signs) are— 


1815 1651 1651 5119 


The second-factor matrix, that is, the matrix showing 
how much correlation is due to the second factor, is then 
made on a chequer board still using the temporary signs, 
and subtracted from the previous matrix of residues (with 
its temporary signs, not with its first signs) to find the 
residues still remaining, to be explained by further factors. 
In the present instance we see that the whole variance of 
the fourth test entirely disappears, and also all the correla- 
tions in which that test is concerned.* This test, therefore, 
is fully explained by the two factors already extracted. 
Only the first three test variances remain unexhausted, 
and their correlations. Again the columns of the residual 
matrix sum exactly to zero. Following our rule, the signs 
of Tests 2 and 8 have to be temporarily changed before 
the process can continue. After these changes of sign the 


* When enough decimals are retained. We shall treat the 
‘0001 as zero. 
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second residual matrix is as follows, and the same operation 
as before is again performed on it : 


2065 (—)-1083 (—) 1033 Second residual 
(—):1088 0516 0516 . matrix with signs 
(91033 0516 0516 temporarily 
R : changed. 
4131 2065 2065 . 8261 = 90892 
3rd Loadings 4545 2272 2272 . with temporary 
signs, 


With these third-factor loadings we can now calculate the 
variances and correlations due to the third factor : and we 
find these are exactly equal to the second residual matrix. 
On subtracting, the third residual matrix we obtain is 
entirely composed of zeros. (In a practical example we 
should be content if it was sufficiently small.) We thus 
find (as our construction of the artificial tests entitled us to 
expect) that the matrix of correlations can be completely 
explained by three common factors. 

After the analysis has been completed, some care is 
needed in returning from the temporary signs of the load- 
ings to the correct signs. The only safe plan is to write 
down first of all the loadings with their temporary signs 
as they came out in the analysis. In our present example 
these happen to be all positive, though that will not 
always occur. 


Loadings with Temporary Signs 
Test I II III 
6005 1815 A545 


1 

2 7881 1651 2272 
3 7881 1651 2272 
4 4879 5119 


* 


Now, in obtaining Loadings II the signs of Tests i, 2, and 
3 were changed. We must, therefore, in the above table 


reverse the signs of the loadings | of these three tests in 
Column IL and each later column. Then in obtaining 


Loadings II the signs of Test 2 and 8 were changed; that 


F.A.—3* 
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is, in our case changed back to positive. The loadings 


with their proper signs are therefore as shown in the first 
three columns of this table : 


Loadings of the Factors (Signs Replaced) 
——— 1 
I 11 iit Specific 
1 6005 — 1818 — 4545 
2 7881 — 1651 + 2272 f; R 
8 7881 — 1651 -+ 2272 l . 5 5477 
. | 
[i 


4 | 4879 5119 7071 


In this table each column of loadings, for the common 
factors after the first, adds up to zero. | The loading of the 
specific is found from the fact that in ech row the sum of 
the squares must be unity, being the whole variance of the 
test. The inner product“ of each pair of rows gives the 
correlation between those two tests (Garnett, 19192). 
Thus— 


= 6005 X 7881 -+ 1815 X 1651 — 4545 X 2272 4000 


j 


in agreement with the entry in the original correlation 
matrix. With artificial data like the present, the analysis 
results in loadings which give the correlations back exactly. 
It will be seen that all the signs in any column of the 
table of loadings can be reversed without making any 
change in the inner products of the rows ; that is, without 
altering thé correlations. We would usually prefer, there- 
fore, to reverse the signs of a column like our Column III, 

so as to make its largest member positive. 

The amount which each factor contributes to the variance 
of the test is indicated by the square of its loading in that 
test. . sum of the squares of the three common-factor 
‘loadings gives the “ communality ”) which we originally 


* By the “ inner product “ of two series of numbers is meant the 
sum of their products in pairs. Thus the inner product of the two 
sets : 


j 


* 


a 0 d 
and A B Cc D 
is aA + bB + cC 4+ dD 
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deduced from Figure 10 and inserted in the diagonal cells of 
our original correlation matrix. These facts can be better 
seen if we make a table of the squares of the above loadings : 


Test 
C Specific | 
I 11 it ‘ommunality Vari Total 
1 8606 0329 2065 | 6000 1000 1 
2 6211 0276 0516 7000 3000 11 
83 6211 0273 0816 7000 000 1 
4 2880 2020 . “5000 "5000 11 
Total 1:8408 3498 3097 2:5000 1:5000 4 


“6. Comparison of the analysis with the diagram. The 
reader has probably been turning from this calculation of 
the factor loadings back to the four-oval diagram with 
which we started (page 69), to detect any connexion ; and 
has been disappointed to find none. The fact is that the 
analysis to which the Thurstone method has led us is, 
except that it too has three common factors, a different 
analysis from that which the original diagram naturally 
invites. That diagram gave for the variance due to each 
factor the following : 


Variance contributed by Each Factor 


Test | 
| Specifie 
I H IH | Communality v Total 
1 4 2 “4 4 1 
2 1 B 4 7 8 1 
8 “+ 3 7 3 1 
4 . 8 5 5 1 
Totals 12 9 4 f 2-5 1-5 ‘4 


and the factor loadings are the positive square roots of 
these. 
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| Loadings of the Factors 


Test | 
| I II HI Specifics 
| 
1 | -6325 5 447 -6324 
2 6325 -5477 5477 
3 6825 5477 8 5 ; 5477 ; 
4 5477 447% A 8 ; ‘7071 


The only points in common between the two analyses are 
that they both have the same communalities (and therefore 
the same specific variances) and the same number of com- 
mon factors. The Thurstone analysis has two general 
factors (running through all four tests), while the diagram 
had none: and the Thurstone analysis has several negative 
loadings, while the diagram had none. We shall sec later 
that (T M aner arriving at this first analysis, en- 
deavours to convert it into an analysis more like that of 
our diagram, with no negative loadings and no completely 
general factdrs>) This is one of the most difficult yet 
essential parts df his method. 

7. Analysis into two common factors.—When we began 

Kc our analysis of the matrix of correlations corresponding to 
Figure 10, we simply put the communalities suggested by 
- that figure into the blank diagonal cells. That served to 
illustrate the fact that the Thurstone method of calculation 
will bring out as many factors as correspond to the com- 
munalities used, here three factors. But it disregarded 
(intentionally for the purpose of the above illustration) a 
cardinal point of Thurstone’s theory that we must seek 
or the communalities which make the rank of the matrix a 
minimum, and therefore the number of common factors a 
minimum.) We simply accepted the communalities sug- 
gested by the diagram. Let us now repair our omission 
and see if there is not a possible analysis of these tests into 
fewer than three common factors. There is no hope of 
reducing the rank to one, for the original correlations give 
two of the three tetrads different from zero, and we may 
(in an artificial example) assume that there are no experi- 
mental or other errors. But there is nothing in the experi- 
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mental correlations to make it certain that rank 2 
cannot be attained. With only four tests (far too few, be 
it remembered, for an actual experiment) there is no minor 
of order three entirely composed of experimentally obtained 
correlations. It may then be the case that communalities 
can be found which reduce the rank to 2.. Indeed, as we 
shall see presently, many sets of communalities will do so, 
of which one is shown here : 


(26) 4 4 2 

4 (7) 7 3 . 
4 7 (7 B 

2 3 3 (15) 


These communalities -26, . 7, 7, and 15 make every 
three-rowed minor exactly zero. For example, the minor 


(26) 4 2 
A (7) 3 
2 3 (15) 
becomes by“ pivotal condensation ” : 
026 0 
0 0 
0 


and finally 


It must,.therefore, be possible to make a four-oval 
diagram, showing only two common factors, and indeed 


more than one such diagram can be found. One is shown 
in Figure 11. 
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This gives exactly the same correlations. For example— 


W 
h 20) 20 
12 a 


"s = (20 x 80) 40 


It also gives the communalities -26, -7, 7, -15. For 
example, in Test 1, variance to the amount of 12 out of 
45 is communal, and 12/45 = -26. 

The insertion of these communalities, therefore, in the 
matrix of correlations ought to give a matrix which only 
two applications of Thurstone’s calculation should com- 
pletely exhaust. The reader is advised to carry out the 
calculation as an exercise. He will find for the first-factor 
loadings— 

5000 8290 8290 3750 
and if in the first residual matrix, following our rule, he 
changes temporarily the signs of Tests 2 and 3, the second- 
factor loadings will be 


1291 — 1128 — 1128 0968 


The second residual matrix will be found to be exactly 
zero in each of its sixteen cells. The variance (square of 


the loading) contributed by each factor to each test is then ` 


in this analysis : 


Variance contributed by Each Factor 
Test = EEn 
I II Communality | SP ecific Total 
Variance 
1 +2500 0167 2667 7333 1 
2 6873 0127 7000 3000 1 
3 6873 0127 -7000 3000 1 
4 1406 0094. 1500 8500 1 
Totals | 17652 0515 1:8167 2:1833 4 


If we now compare these analyses, we see that the three 
common factors of the previous analysis“ took out,“ as 
the factorial worker says, a variance of 2-5 of the total 4, 


THE CENTROID METHOD 79 


leaving 1:5 for the specifics. The present analysis leaves 
2-1833 for the specifics, which here form a larger part of 
the four tests. 

vy 8. Alewander’s rotation.—We saw in Section 6 that the 
Thurstone method there led to an analysis which was 
different from the analysis corresponding to the diagram 
with which we began. That is also the case with the 
present analysis into two common factors—the very fact 
that it gives the second factor two negative loadings shows 
this, for the diagram (Figure 11) corresponds to positive 
loadings only. We said, too, in Section 6 that a difficult 
part of Thurstone's method was the conversion of the 
loadings into new and equivalent loadings which are all 
positive.) This will form the subject of a later and more 
technical chapter; but a simple illustration of one method 
of conversion (or “rotation ” as it is called, for a reason 
which will become clear later) can be given from our present 
example. It is a method which can be used only if we have 
reason to think that one of our tests contains only one 
common factor (Alexander, 1935, 144). Let us suppose in 
our present case that from other sources we know this fact 
about Test 1. The centroid analysis has given us the 
loadings shown in the first two columns of this table : 


Unrotated Rotated Rotated 
Test Loadings Communality Loadings Loadings 
Tf II T* II 8 
1 5000 1291 2667 5164. . 4781 1952 
2 8290 — 1128 7000 7746 3162 8367 . 
8 |8290 — -1128 7000 7746 3162 8367 . 
4 3750 0968 1500 3873 . 3586 1464 


The communalities are also shown; they are the sums of 
the squares of the loadings. If now we know or decide to 
assume that Test 1 has really only one common factor, and 
if we want to preserve the communalities shown, then the 
loading of factor I* in Test 1 must be the square root of 
-2667, namely -5164. 

The loadings of factor I* in the other three tests can 


) 
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now be found from the fact that they must give the corre- 
lations of those tests with Test 1, since Test 1 has no 
second factor to contribute. The loadings shown in 
column I* are found in this way: for example, -7746 is 
the quotient of -5164 divided into ri, (4), and -8878 is 
similarly r,, (2) divided by -5164. 

The contributions of factor I* to the communalities are 
obtained by squaring these loadings. In Test 1, we 
already know that factor I* exhausts the communality, for 
that is how we found its loading. We discover that in 
Test 4, factor I* likewise exhausts the communality, for 
the square of -3873 is -1500. The other two tests, however, 
have each an amount of communality remaining equal to 
1000 (i.e. -7000 — -7746?). The square root of -1000, 
therefore (3162), must be the loading of factor II* in 
Tests 2 and 3. The double column of loadings ought now 
to give all the correlations of the original correlation 
matrix, and we find that it does so. Thus, e.g.— 


Tos = ‘7746 X -7746 + -3162 X +3162 = -7000 
and ra = 7746 X -3873 = -3000 


Moreover, the analysis into factors I* and II* corre- 
sponds exactly to Figure 11. For example, the loading of 
factor II“ in Test 2 in that diagram is the square root of 
2/20 (3162); and the loading of factor I* in Test 4 is the 
square root of 12/80 (3873). 

If, however, the experimenter 
had reasons for thinking that Test 
2 (not Test 1) was free from the 
second common factor, his“ rota- 
tion“ of the loadings would have 
given a different result, shown in 
the table on page 79 in column I** 
and II**. This set of loadings 
also gives the correct commu- 

Figure 12. nalities and the experimental corre- 
lations, but does not correspond 

to Figure 11. A diagram can, however, be constructed to 
agree with it (Figure 12) and the reader is advised to cheek 
the agreement by calculating from the diagram the load- 


— eS 
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ings of each factor, the communalities of each test, and the 
correlations. 

We have had, in Figures 10, 11, and 12, three different 
analyses of the same matrix of correlations. If with 
Thurstone we decide that analyses must always use the 
minimal number of common factors, we will reject Figure 10. 
Between Figures 11 and 12, however, this principle makes 
no choice. Much of the later and more technical part of 
Thurstone’s method is taken up with his endeavours to 
ey down conditions which will make the analysis unique. 

9. Unique communalities—The first requirement for a 
unique analysis is that the set of communalities which gives 
the lowest rank should be unique, and this is not the case 
with a battery of only four tests and minimal rank 2, like 
our example. There are many different sets of com- 
munalities, all of which reduce the matrix of correlations 
of our four tests to rank 2. If, for example, we fix the 
first communality arbitrarily, say at -5, we can condense 
the determinant to one of order 3 by using -5 as a pivot 
(as on page 65) except that the diagonal of the smaller 
matrix will be blank : 


L 


C5) 4 -4 2 
“4 7 3 
4 7 e 3 
2 3 3 

: -19 07 
19 j 07 
07 07 


We can then fill the diagonal of the smaller matrix with 
numbers which will make each of its tetrads zero, namely 


10 10 0258 
and then, working back to the original matrix, find the 
communalities— 

5 7 7 1316 


which make its rank exactly 2. We can similarly insert 
different numbers for the first communality and calculate 
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different sets of communalities, any one set of which will 
reduce the rank to 2. In this way we can go from 1:0 
down to 0-22951 for the first communality without obtain- 
ing inadmissible magnitudes for the others. Some sets 
are given in the following table * : 


1 2 3 4 Sum 
1-0 7 7 12963 2052963 
5 T 7 13158 2003158 
3 7 7 14 1:84 
20 7 7 15 1:816 
2⁵ 7 7 16 1-816 
24 7 7 20 | 184 
22951 7 7 1-0 2-62951 


If, however, we search for and find a fifth test to add to 
the four, which will still permit the rank to be reduced to 
2, this fifth test will fix the communalities at some point 
or other within the above range. Suppose that this test. 
gave the correlations shown in the last row and column : 


1 2 3 4 5 
1 . 4 4 2 5883 
2 4 i: 3 2852 
3 4 7 . 3 2852 
4 2 3 3 . 1480 
5 5883 2852 2852 1480 


If we now try to find communalities to reduce this 
matrix to rank 2 (as can be done), we find only the one 
set— 

10 * arf 13030 5 


The reader can try this by assigning an arbitrary value for 
the first one, f and then condensing the matrix on the lines 


* The circumstance that the communalities of Tests 2 and 3 
remain fixed and alike is due to these tests being identical except for 
their specific. This lightens the arithmetic, but would not occur 
in practice. 

t Alternatively, the communalities (which are now unique) can 
be found by equating to zero those three-rowed minors which have 
only one element in common with the diagonal. In this connexion 
see Ledermann, 1937a. 
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employed above, when he will always find some obstacle 
in the way unless he chooses -7. Try, for example, 5 for 
the first communality : 


(5) 4 4 2 -5883 
4 R 7 3 2852 
4 7 ; 3 2852 
2 3 3 8 1480 
-5883 2852 2852 1480 

(2) 19 07 — 09272 

19 : 07 — 09272 

07 07 i — -04366 
— -09272 —-09272 — 04366 


Now, if the upper matrix is to be of rank 2, the 
second condensation must give only zeros (see footnote, 
page 65). But if we fix our attention on different tetrads 
in the lower matrix which contain the pivot æ, we see that 
they give, if they have to be zero, incompatible values for 
æ. Thus from one tetrad we get æ = 19, from another 
æ 14866. With -5 as first communality, rank 2 
cannot be attained. With five tests (or more), if rank 2 
can be attained at all, it can be by only one unique set of 
communalities. ( Just as it took three tests to enable the 
saturations with Spearman’s g to be calculated, so it takes 
five tests to enable communalities due to two common 
factors to be calculated.) For larger numbers of common 
factors, the number of tests required to make the set of 
communalities unique is shown in the following table 
(Vectors, 77). The lower numbers!“ are given by the 

t „formula —, 


ie (2r +1) ze +1) 


ee a 


n Tests 5. 5 6 8 9 i 12 18 14 SS EES 


* With six tests the communalities which reduce to rank 3 are 
not necessarily unique, for there are, or there may be, two sets of 
them. See Wilson and Worcester, 1939. 

I think the ambiguity, which is not practically important, only 
occurs when n is exactly equal to the formula, e.g. when r = 3, 6, 
10, ete. 
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If we were actually confronted with the matrix of correla- 
tions shown on page 69, and asked what the communalities 
were which reduced it to the lowest possible rank, we would 
find it very unsatisfactory to have to guess at random and 
try each set ; and our embarrassment would be still greater 
if there were more tests in the battery, as would actually be 
the case in practice. There would also be sampling error 
(which in this our preliminary description of Thurstone’s 
method we are assuming to be non-existent). Under these 
circumstances, devices for arriving rapidly at approximate 
values of the communalities are very desirable. 

V 10. Method of approwimating to the communalities.—Vhur- 
stone has described many ways of estimating the com- 
munalities, and articles still issue from his laboratory on 
this subject. (He points out, however, that if the number 
of tests is fairly large, an exact estimate is not very import- 
ant, and can in any case be improved by iteration, using 
the sums of squares of the loadings for a new estimate. 

The simplest plan is to use as an approximate com- 
munality the largest correlation coefficient in the column. 
That this is plausible can be seen from a consideration of the 
case where there is only one factor, when the communality 
of Test 1 would be 112 . 713/123, which is likely to be roughly 
equal to either v1 or is if these tests correlate highly with 
Test 1 and probably therefore with each other. 

We shall illustrate this, the easiest, method on the same 
example as we used above, for the sake of comparison and 
for ease in arithmetical computation, even although that 
example is really an exact and artificial one unclouded by 
sampling error. Inserting then the highest coefficients in 
each column we get : 


(5883) -4 4 2 5883 
4 (7) 7 3 2852 
4 E (O Eg 2852 
2 3 83 (3) 4480 


5883 2852 2882 -1480 (5883) 


2-1766 2:3852 2.3852 1-2480 1-8950 — 10-0900 
= 317652 
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First 
Loadings 6852 „7509 -7509 3929 :5966 


The communalities which really give the minimum rank 

are, as we saw on page 82— 
Cee * N 1303 5 

and the correct first-factor loadings obtained by their use— 
7257 7564 7564 3420 5729 

With a large battery the difference between the loadings 
obtained by the approximation and by the eorrect com- 
munalities would be much less. For the “ centroid ” method 
depends on the relative totals of the columns of the correla- 
tion matrix; and when there are twenty or more tests, 
these relative totals will not be seriously changed by the 
exact value given to the communality in the column. 
When the number of tests is large, the influence of the one 
communality in each column is swamped by the influence 
of the numerous correlations. 

The process now goes on as on page 71, and the residuals 
left after subtraction of the first-factor matrix check by 
summing in each column to zero, as there. 

Before, however, proceeding any farther, in this approxi- 
mate method we delete the quantities in the diagonal (the 
residues of the guessed communalities) and replace them by 
the largest coefficient in the column regardless of its sign, 
which we change to plus in the diagonal cell if it is negative 
in its own cell.* The reason for this is apparent, especially 
when, as may and does happen, the existing diagonal 
residues are negative, which is theoretically impossible. 
For although the guessing of the first communalities does 
not in a large battery make much difference to the first- 
factor loadings, it may make a big difference to the diagonal 
residues, If the battery is very large indeed, our first- 
factor loadings would come out much the same, even if we 
entered zero for every communality, but the diagonal 
residues would then all be negative. In short, the diagonal 
residues are much the least trustworthy part of the calcu- 


* It is necessary to keep an eye on the fact that what is inserted 
must not, with the squares of the previous loadings of that test, 
amount to more than unity. 
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lation when approximate communalities are used, and it is 
better to delete them at each stage and make a new 
approximation. 

11. Illustrated on the chample. To make this clearer, the 
whole approximate process is here set out for our small 
example as far as the second residual matrix. The ex- 
planations printed alongside the calculation will make 
each stage clear. It is important to form the residual 
matrices exactly as instructed, as otherwise the check of 
the columns summing to zero will not work. In practice, 
certainly if a calculating machine were being used, several 
of the matrices here printed for clearness would be omitted ; 
for example, with a machine one would go straight from 
A to C, while D and E would be made by actually altering 
C itself : 


(-5883) 4 4 2 5883 
4 (7) 7 3 2852 Largest r of 
“4 cre (7) 3 2852 | column inserted 
2 3 3 (3) 1480 in diagonal cell. 
5888 2852 2852 1480 (5883) 
2:1766 2:3852 23852 1-2480 1:8950 = 10-0900 
= 8:1765? 


Loadings I| -6852 7509 7509 3929 5966 = 3:1765 


6882 (4695) 5145 -5145 2692 4088 

5145 (.5630) „5639 -2950 4480 
7509 5143 5639 (5639) +2950 4480 
3929 2692 2950 2950 (1840) 2344 
5966 | 4088 4480 4480 2344 (3559) 


(1188) —-1145 —-1145 —.0692 -1795 
—.1145 (1361) 1361 0050 —-1628 First residual 
—1145 1361 (1361) -0050 —-1628 | matrix. 
—-0692 -0050 -0050 (-1456) —-0864| A — B 
1795 —-1628 —-1628 —-0864 (2324) 


First-factor 
matrix. 


0001 —-0001 —-0001 0000 —-0001 | Columns check 
to zero. 


(61795) —-1145 —-1145 —-0692 1795 Largest r of each 
1145 (1628) 1361 0050 —-1628 | column (regard- 
— 1145 -1361 (1628) -0050 —-1628 less of sign) in- 
—0692 0050 -0050 (0864) —-0864| serted in each 
1795 —-1628 —-1628 —-0864 (1795) diagonal cell. 
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6572 5812 5812 2520 7710 Sum disregard- 
ing signs. 


| (1795) 1145 1145 0692 1795 Signs of Tests 2, 
| 1145 (1628) -1361 „0050 1628 3, and 4 changed 


l 1145 1361 (1628) „0050 1628 to make largest 
0692 0050 „0050 (0864) „0864 column (7710) 
| 1795 1628 1628 0864 (41795) all positive. 
Algebraic i 
Sum 6572 5812 +5812 2520 7710 = 2:8426 


= 1:6860? 


LoadingsII| +8898 -3447 3447 1495 4573 (With temporary 


signs.) 


3898 | (1519) 1344 1344 0583 41783 
3447 4344 (1188) 188 0515s 1576 | Second-factor 
3447 | 1344 41188 (41188) „0515 1576 matrix, using 
1495 | 0583 0515 0515 (0124) 0683 temporary signs. 
4573 1783 41576 1576 0683 (2091) 


(0276) —:0199 —-0199 0109 0012 
| —-0199 (0440) 0173 —-0465 »0052 Second residual 
— 0199 0173 (0440) —-0465 0052 matrix. 

0109 —.0465 — 0465 (0640) 0180 HE — F 

0012 0052 0052 0180 (—-0296) 


—.0001 — 0001 0001 —-0001 „0000 | Columns check 
to zero. 


Notes.—It is fortuitous that all the entries in E are positive. 
Usually some will be negative. 

In the check for the residual matrices, a discrepancy from zero 
in the last figure is often to be expected, even of three or four units 
in a large matrix. 

Note the negative value occurring in a diagonal cell in G. 


Further stages would be carried on in the same way. 
But at each stage the residues will be examined to see if 
further analysis is worth while, by methods indicated later. 
Meanwhile let us assume in the present example that no 
more factors need be extracted. 

The matrix of loadings of common factors thus arrived 
at is, after we have replaced the proper signs in Load- 
ings II, shown at the top of the next page. 

The communalities -6214, etc., are the sums of the 
squares of the two loadings. For comparison with the 
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Fe Method True Values 


Test <a j 
I | II Communality | Communality 

1 6852 3898 6214 “7000 
2 7509 — 3417 6827 7000 
3 7509 — 3447 6827 7000 
4 3929 — 1495 1767 1303 
5 5966 4573 5651 5000 

| — 

| 2:7286 2:7303 


approximate communalities thus obtained there are shown 
the true values, which in this artificial case are known to 
us (see Section 9). This is for instructional purposes 
only—the comparison is not intended as any criticism of 
Thurstone’s method of approximation. As has been 
explained, this method is used only on large batteries, and 
it is a very severe test indeed to employ it on a battery of 
only five tests. 

v 12. Iteration of the process to improve the communalities.— 
We might now go back and begin our whole calculation 
again, using the communalities -6214, ete., arrived at by 
the first approximation. This does not seem often to be 
done in practice, most workers being content with the 
approximation first arrived at. If we repeat the calcula- 
tion again and again with our present example, on each 
occasion using as communalities the sum of the squares of 
the loadings given by the preceding calculation, we get the 
following sets of closer and closer * to the 
true communalities* : 


hè | he r 

First trial commu- | | | 
nalities 3883 7000 7000 | 3000 | 5883 
Next approximation 6214 6827 6827 1767 | -5651 
Next approximation | +6381 | -6970 | 6970 | 1477 | -5892 
Next approximation | -6535 | 7043 7043 | -1397 5253 
True values | 7000 | 7000 +7000 | -1303 | 5000 


* In these repetitions we do not, as in the case of the first guess, 
alter the diagonal cells in each matrix of residues: we retain the 
diagonal residues without change. 
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The example has served to show how to work the 
iterative method of approximating to the communalities. 
Being an artificial example, and not overlaid with sampling 
error, it has had the advantage of allowing us to compare 
the approximations with the true values. But it must be 
remembered that a real experimental matrix is not likely 
to have an exact low rank to which approximation can 
converge as here. In that case the approximations will 
presumably give an indication of the low rank which the 
matrix nearly has, which it might be made to have by small 
adjustments in its elements. 

It should be pointed out that iteration of each factor 
extraction separately will not give the same result. By 
iteration of the factors one by one we mean that after the 
loadings of the first factor are obtained they are squared 
and put into the diagonal cells as new communalities, and 
this is repeated again and again until the communalities 
remain unchanged. When this point is reached, the orig- 
inal matrix of correlations has been reduced as nearly to 
rank one as is possible. 

If the residues, after removal of the first factor, are then 
(after sign-changing) treated in the same way, they in 
turn will be reduced as nearly as possible to rank one. 
And so with successive residues, each matrix of residues 
being in succession reduced as nearly as possible to rank 
one by iteration of the one summation only. This process, 
although much easier than reiterating the whole process, 
and to that extent excusable, will not give the lowest pos- 
sible rank for the whole. Consider, for example, the 
correlations of the five tests used above on page 82. When 
communalities are reiterated with the first factor only, 
they settle down rapidly (the reader should check this) to— 


4571 5421 +5421 1261 2729 
When the residues then left are taken, and a factor taken out 
and iterated, the communalities settle down to— 

-1677 1003 1003 0113 1680 
The sum of these first-factor and second-factor sets is the 


set— 
-6248 6424 6424 1374 4409 
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These, however, if inserted in the diagonal cells of the 
original matrix, do not reduce it exactly to rank two, as 
can be done by the true communalities— 


7000 7000 7000 1303 5000 


Iteration over two factors, as shown in the table on page 88, 
produces with four repetitions the approximations— 


6535 7043 7043 1397 5253 


and (since in this artificial example rank two can be exactly 
reached) would ultimately converge to the above true 
values, though at the expense of much labour, for the 
convergenceisslow. Theiteration of each factor separately, 
however, would never converge to the true values. The 
above values (-6248, etc.) are final, and yet do not give 
rank two. 

18. Other methods of assessing the communalities.—The 
labour of finding the minimum communalities by iteration 
is so great that methods of improving the first guess are 
desirable. Medland (Pmka. 1947, 12, 101-10) has tried 
nine such methods on a correlation matrix with 63 vari- 
ables. A method entitled Centroid No. 1 method seemed 
to be best. A sub-group is chosen of from three to five 
tests which correlate most highly with the test whose 
communality is wanted. The highest correlation f in each 
column of the sub-group is inserted in the diagonal cell, 
and the columns summed. The grand total is also found. 
Then the estimate of h? is— 


(Zr + 4)? 


Sr + St 
where the numerator is the square of the column total, 
and the denominator is the grand total. Thus if the cor- 
relations of the sub-group were— 


(72) 72 63 24 
72 (72) 47 59 
63 47 (63) 41 
24 -59 41 (-59) 


2:81 2-50 2-14 1:83 = 8:78 
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the estimate of hj would be— 
2:31? 
8-78 
Clearly the same sub-group will usually serve for more than 
one of its members. Thus from the above example h; 
can be estimated to be -712. 

A graphical method, for which the reader is referred to 
Medland’s article, was about equally accurate but more 
laborious. Rosner (Pmka. 1948, 18, 181-4) gives an alge- 
braic solution for the communalities depending upon the 
Cayley-Hamilton theorem that any square matrix satisfies 
its own characteristic equation, but adds that the method 
“is not at all suited for practical purposes. The com- 
putational labour is prohibitive.” It is, however, interest- 
ing theoretically and may suggest new advances. 


= 608 


CHAPTER VI 
ETHE GEOMETRICAL PICTURE 


“1. The scatter-diagram of two tests.A well-known way 
of representing correlation, and that used by Sir Francis 
Galton who devised correlation coefficients, is by a scatter- 
diagram. The scores in two tests are used as rectangular 
abscisse and ordinates, and each person represented by a 


TEST 2 


— — — —— ftr 


Figure 13. 


dot. Thus, if a person makes a score of X = 72 in a Test 1 
and of Y = 59 in a Test 2, he is represented by the point P. 
The two tests are represented by the rectangular axes. 
Ik a large number of persons take the two tests, their points 
form the “scatter-diagram,” looking like a lot of shots at a 
target. The dots are most densely crowded together near 
a point whose ordinates are the average scores in the two 
tests. If there is no correlation between the two tests, 
and suitable units are used, the dots will thin out equally 
in all directions, forming a circular-shaped group. If, on 
the other hand, there is correlation, the group of dots will 
be elliptical in appearance, with an axis slanting-wise 
inclined to the test lines; and more and more elliptical— 
92 


¿ 


i 
1 
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the closer the resemblance of the scores, the higher, that 
is, the correlation. If we have first standardized the scores, 
the test lines will pass through the centre of the group, the 
average, and the axis of the ellipse will be equally inclined 
to both tests. In Figure 14 it is indicated how the ellip- 
tical group of dots narrows in the one direction, and 
lengthens in the other, with increasing correlation. The 
circle corresponds to zero correlation, the fat ellipse to 


TEST 2 


ww TEST I 


4 
Figure 14, 


r = 5, the long thin one tor = 9. In perfect correlation 
all the dots would be on a line. In negative correlation 
the ellipse would be slanting the other way. These 
ellipses must not be looked upon as bounding the group of 
dots, which thins out to an indefinite distance. They are 
like contours of a hill, being, in fact, “contours” of the 
density of the dots. 
v 2. Three tests—When we have three tests we need 
three rectangular axes, like the three lines which meet in 
the corner of a room. A person’s three scores, measured 
along these lines, define a point in solid space, a point in 
the room. The points thus representing a large number 
of persons will form a swarm in the room, congregated 
most thickly round the man who is average in all three 
tests, like a swarm of bees round the queen. If there is no 
correlation between any of the tests and suitable units are 
used, the swarm will be globular, but if there is correlation 
it will lengthen into an ellipsoidal shape like a Rugby 
football or a Zeppelin, though its waistline need not be 
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circular. In place of the ellipses of the two-dimensional 
figure, we now have ellipsoidal shells of equal density of the 
dots representing persons. One such is shown in Figure 
15, which the reader can imagine as being the room in 
which he is seated, the test lines, in their positive halves, 
being represented by the three edges of floor and walls 


TEST2 


TESTI 


Figure 15, 


which meet in a corner, where the point representing the 
average man is placed. The ellipsoidal swarm is then 
partly in the room, partly outside and below it. The part 
of the swarm in the room (in the positive octant, that is) is 
composed of persons scoring above the average in all three 
tests. The end of the major axis of the ellipsoid, that is, 
the longest line that can be ei ie in it, is shown poi 


the ellipsoid, projected at right E on to a wall or the 
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floor of the room, will be a correlational ellipse due to the 
two tests edging that wall, or edging the floor. These 
three silhouettes will in general be different, depending on 
the adiposity, as it were, of the ellipsoid. 

When we have more than three tests we cannot make or 
easily imagine a similar model, for we know in real life 
only space of three dimensions. But mathematically we 
still can conceive of as many rectangular axes as there are 
tests, in a “space” of more dimensions, of as many 
dimensions, indeed, as the number of tests. And we still 
speak of the “ ellipsoidal ” shape of the swarm of persons.) 

3. The four quadrants.—Let us now return to the case 
of two tests. If the persons tested are numerous it will, 


b a 


a b 


with most tests, be found that the numbers in the two 
quadrants marked a are approximately equal (the axes 
being drawn, it is understood, through the average score 
of each test) and, similarly, the numbers in the two quad- 
rants marked 6 in the figure. 

A portion a of the crowd of persons, that is, get scores 
above the average in each test, and an equal portion a are 
below the average in each. These people add to the 
correlation between the tests, whereas the others, in the 
b quadrants, are all good in one but bad in the other test 

— 4 detrac Rý 8 


ERRATA (Firrn EDITION) 


The last complete sentence on page 94 and the last 
sentence of section 5 on page 97 are incorrect and should 
be deleted. The major axis is not equally inclined, in 
general, to the orthogonal test lines. j ; 
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rf cos 0 Se 1000 8 == 60° = 0-5 
. h A 968 Zg 
O; 3000 cos 6 2 


Actual correlation tables will, of course, not show such 
complete equality in the opposite quadrants, and, more- 
over, the reader must beware of applying this formula 
unless the dividing lines are drawn through the means. 

v 4. Making the crowd circular. We are next going to 
make a change in our model by rotating the two test 
vectors, hitherto at right angles, towards one another until 
the angle between them is the above angle 0, whose cosine 
is the correlation. coefficient. A person's point P will still 
be located at the point where the two perpendiculars from 
his scores meet. The rotation of the test lines towards one 
another, pivoted on the average man at the point where 
the ; Will, however, move the dots representing per- 
sons, and move them in such a way thatthe elliptical 
shape of the crowd disappears and it becomes circular. 
The presence of correlation is not now shown by the con- 
figuration of the crowd, but by the angle between the test 
lines. The cosine of this angle is the correlation coefficient. 


If we guide the eye by drawing a dotted line at right 
angles to each test line, we see that our former quadrants 
a and b are now represented by sectors of the circular 
crowd. Perpendiculars from any point in the white sectors 
a on to the test lines both fall on the same side of the 
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average: all persons situated in these sectors are either 
above the average in both tests (like P) or below in both. 
Anyone, on the other hand, whose point is in the shaded 
sector b is above the average in one of the tests and below 
in the other. Those in a add to the correlation, those in b 
diminish it. If correlation is perfect, the two test lines 
must be brought together until they coincide: and then 
the dotted lines will also coincide and the sector b will 
disappear. If, on the other hand, the correlation is low, 
the test lines will have to be farther apart, and the sector b 
will inerease, until, when correlation is zero, the test lines 
are at right angles and the sectors a and b are equal 
and balance one another, the pros equal to the cons. 
For negative correlation the angle 0 between the test 
lines becomes obtuse, and the sectors b larger than the 
sectors a. 

5. Ellipsoid into sphere. — With three tests we saw that 
the solid “ seatter-diagram,” made with the test lines at 
right angles to one another, was ellipsoidal in form. Just 
as we converted the elliptical two-dimensional scatter- 
diagram into a circular crowd of dots by bringing the test 
lines closer together, until the cosine of the angle between 
them equalled the correlation coefficient, so with the ellip- 
soidal swarm of dots when we have three tests. If we take 
hold of the three test lines and swivel them nearer to each 
other, until the angle between each pair represents their cor- 
relation coefficient by its cosine. J we then find that the ellip- 
— has become a sphere. | More 


6. A wire model. ie us suppose we want to make a 
wire model of this arrangement of three test lines, supposing 
that we have calculated by the usual product-moment 
formula the three correlation coefficients. Choosing any 
two of the tests, we find from a table of cosines what angle 
has a cosine equal to their correlation coefficient, and we 
lay two straight wires on the table crossing one another at 

F.A. —4 


98 THE FACTORIAL ANALVSIS OE HUMAN ABILITY 


this angle, like an X. Imagine them soldered together at 
the point where they cross, which represents the man 
average in each test. 

Now consider the third test, and look up the angles 

whose cosines equal its correlation coefficients with Tests 
land 2. The wire for this third test must be so placed as 
to make these angles with the first two wires—and we find 
that it will not lie flat on the table but sticks up at an angle, 
and its negative half has to go through the table and stick 
out below it. If we solder the three wires together where 
they cross (at the point representing the man who gets 
the average score in each of the three tests) and pick them 
up, they form a double tripod. 
7. Two kinds of space. It will be seen that we have 
described two geometrical ways of representing correlation 
using two different spaces. In the.one kind of space, the 
test lines are at right angles to one another, or orthogonal, 
and the presence of correlation is shown by the fact that 
the swarm of dots representing persons is not spherical but 
ellipsoidal. à 

In the other kind of space, the crowd of dots representing 
persons is spherical and the presence of correlation is 
shown by the test lines not being orthogonal but at angles 
with one another whose cosines equal the correlation 
coefficients. 

In both kinds of space, a person’s scores in the tests are 

found by dropping perpendiculars from his point on to the 
test lines. The distances of the feet of these perpendiculars 
from the origin—that is, from the point where the test lines 

eross are his scores in the tests. 
If the test lines in this second kind of space are swivelled 
back into orthogonality, the person-points will move, will 
cease to be spherical in contour; and become ellipsoidal. 
All this is true, not only for three-dimensional space, when 
we have only three tests, but for multi-dimensional space 
needed to represent many tests and their inter-correlations. 
The algebra is exactly the same for any number of dimen- 
sions, and we continue, in the larger spaces, to use by 
analogy the terms we are accustomed to in real space, such 
as sphere, ellipsoid, ete. - 
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8. A still larger space.—Another way of arriving at the 
second of the above two kinds of space—the spherical one, 
in which the cosines equal the correlation coefficients—is 
to begin with a much larger space, of as many dimensions 
as there are persons, who are therein represented by 
orthogonal axes. If along each person’s axis we set off 
the score he gets in a given test, say Test 1, these abscissa 
will define a point in the space representing that test. In 
the same way each test can be represented by a point. It 
is a scatter-diagram with the usual réles of tests and persons 
exchanged. 

These test points will usually be much less numerous than 
the persons, and they define a sub-space of dimensions 
equal to the number of tests. This sub-space, if the test 
scores have been normalized,* is the same as our spherical 
space, and the lines joining the origin to the test points are 
our former lines, separated by angles whose cosines equal 
the correlation coefficients. 

9. Factor des. The problem of factorial analysis is to 
decide upon a set of axes to use in the space in which the 
test lines exist. Let us explain 
this first of all in the simplest case, 
that of two tests, represented by 
their lines in a plane, at the angle 
corresponding to their correlation. 

In this case, the most natural 
way of drawing orthogonal axes 
on the paper is to place one of 
them (see Figure 17) half-way 
between the test vectors, and the Figure 17, 
other, of course, at right angles to 
the first. Of these two factor axes, OA is as near as it can 
be to both test lines. Š 

We pictured, before, a swarm of ten thousand dots on 
the paper, each representing a person by his scores in the 
two tests, found by dropping perpendiculars from his dot 
to the two vectors. Instead of describing each point (each 
person, that is) by the two test scores, it is clear that we 
could describe it by the two factor scores—the feet of 


* See footnote, page 6. 
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perpendiculars on to the factor axes. It is also clear 
that, as far as this purpose goes, we might have taken 
our factor axes anywhere, and not necessarily in the posi- 
tions OA and OB, provided they went through the point O 
and were at right angles. In other words, we can“ rotate ” 
OA and OB round the point O, and any position is equally 
good for describing the crowd ‘of persons. Either of the 
tests, indeed, might be made one of the factors. The 
positions shown in Figure 17 are advantageous only if we 
want to use only one of our factors and discard the other, 
in which case obviously OA is the one to keep, as it lies 
as near as possible to both test axes. The scores along OA 
are the best possible single description of the two test 
results. 

„ 10.(Spearman axes for two tests.—The orthogonal axes 
chosen by Spearman for his factors are, however, none of 
the positions to which OA and OB can be rotated in the 
plane of the paper. Besides, Spearman has three factors, 
and therefore three axes, for two tests, namely the general 
factor and the two specific factors, and we cannot have 
three orthogonal axes or factor vectors on a sheet of paper. 
The Spearman factors must, for two tests, lie in three- 
dimensional space, like the three lines which meet in the 
corner of aroom. If we rotate the OA and OB of Figure 17 
out of the plane of the paper (say, pushing 4 below the 
surface of the paper, and, say, raising B above it), we shall 
clearly have to add a third axis, at right angles to OA and 
OB, to enable us to describe the tests and the persons who 
remain on the paper. There are now three axes to rotate ; 
and they must rotate rigidly, remaining at right angles to 
one another. The point at which Spearman stops the 
rotation, and decides that the lines then represent the 
„best“ factors, is a position in which one of the axes is 
at right angles to Test X, and another is at right angles to 
Test Y. The third axis then represents g. 

vV 11. Spearman, aves for four tests.—We are accustomed to 
depicting three dimensions on a flat sheet of paper, and 
so we can, in Figure 18, represent the Spearman axes g, $1, 
and s, for two tests. And since we have begun to depict 
other dimensions, by means of perspective, on a flat sheet, 
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let us continue the process and by a kind of super-per- 
spective imagine that the lines s% s,, and any others we 
may care to add, represent axes sticking out into a fourth, 
a fifth, and higher dimensions. Figure 18 thus represents 
the five Spearman axes for four tests, of which only the 
line of the first test is shown (in its positive half only). 

All the five lines g, Si, S» Sa, and s, must be imagined as 
being each at right angles to all 
the others in five-dimensional 
space. The line of Test 1, shown 
in the diagram, lies in the plane 
or wall edged by g and s,. It 
forms acute angles with g and 
with s,, the cosines of which 
angles are its saturations with g 
and si respectively. If it had 
been highly saturated with g, it 82 S3 
would have leaned nearer to g Figure 18. 
and farther away from si. 

The other three axes, S» Sa and s,, are all at right angles 
to the wall or plane in which Test 1 lies. They have, 
therefore, no correlation with Test 1, no share in its 
composition. Test line 2 similarly lies in the wall edged 
by g and s» test line 3 in that edged by g and sẹ. The 
axis g forms a common edge to all these planes. If the 
battery of tests is hierarchical—that is, if the tetrad- 
differences are all zero—then all the tests of the battery 
can be depicted in this way, each in its own plane at right 
angles to all the other planes, no test line being in the 
spaces between the “ walls.” / 

The four test lines themselves, of course, are only in 
a four-dimensional space (a 4-space we shall say, for 
brevity). Just as, when we were discussing Figure 17, we 
said that Spearman used three axes which were all out of 
the plane of the paper, so here in Figure 18, with four test 
lines (only one shown) in a 4-space, Spearman uses five 
axes in a space of one dimension higher than the number 
of tests. For n hierarchical tests, Spearman’s factors are 
in an (n + 1)-space.. 

Ik along each test line we measure the same distance 


7 
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as a unit, then perpendiculars from these points“ on to the 
gaxis will give the saturations of the tests with g as fractions 
of this unit distance. The four dots on the g axis in Figure 
18 may thus be taken as representing the test vectors + 
projected on to the “ common-factor space,” which is here 
a line, a space of one dimension only. Thurstone's system 
is like Spearman’s except that the common-factor space is 
of more dimensions, as many as there are common factors. 
Figure 19 shows the Thurstone axes for four tests whose 
matrix of correlation coefficients can be reduced to rank 2. 

12. A common-factor space of two dimensions.—Here there 
are two common factors, a and b, and four specifics, sy, 
S% $3, and s, All the six axes representing these factors 
in the figure are to be imagined as existing in a 6-space, 
each at right angles to all the 
others. The common-factor 
space is here two-dimensional, 
the plane or wall edged by a 
and 6—to make it stand out 
in the figure, a door and a 
b window have been sketched 
upon it. 

In Spearman’s Figure 18, 
each test line lay in a plane 
defined by g and one of the 
specific axes. Here in Figure 
19, each test line lies in a different.8-space. These different 
8-spaces have nothing in common with one another except 
the plane ab, the wall with the door and window in the 
diagram. In Figure 18 the projections of the unit test 
vectors on to the common-factor space were lines which all 
coincided in direction (though they were of different 
lengths), for there the common-factor space was a line. 
Here the common-factor space is a plane, and the pro- 


S2 
=i) 
Figure 19. 


jections of the four test vectors on to that plane are shown 


* These points are then the same as those arrived at by the process 
described in Section 8 (page 99). 

+A vector is a direction with a magnitude, and now that we have 
measured unit distance along each test line, we may speak of unit 
test vectors. 
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in the figure by the numbered lines on the “ wall.” These 
lines, if they are all projections of vectors of unit length, 
will by their lengths on the wall represent the square roots 
of the communalities. 
<18. The common-factor space in general.—When there are 
r common factors, the common-factor space is of r dimen- 
sions, and the whole factor space (including the specifics) is 
of (n + r) dimensions. The test vectors themselves are in an 
n-space ; their projections on to the common-factor space 
are crowded into an r-space, and are naturally at smaller 
angles with one another than the actual test vectors are. 
These angles between the projected test vectors do not, 
therefore, represent by their cosines the correlations be- 
tween the tests. The angles are too small for that, and 
the cosines, therefore, too large. But if we multiply the 
cosine of such an angle by the lengths of the two projections 
which it lies between, we again arrive at the correlation. 
Thus in Figure 19, the angle between the lines 1 and 3 
on the wall is less than the angle between the actual test 
vectors 1 and 8 out in the 6-space, of which the lines on 
the wall are the projections. But the lengths of the lines 1 
and 8 on the wall are less than the unit length we marked 
off on the actual vectors, being, in fact, the roots of the com- 
munalities. If we call these lengths on the wall h, and hs, 
then the product hg times the cosine of the projected 
angle again gives the correlation coefficient. 
“14. Rotations—It will be remembered that Thurstone, 
after obtaining a set of loadings for the common “factors 
by his method of analysis of the matrix of correlations, 
“ rotates ” the axes until the loadings are all positive— 
and he also likes to make as many of them as possible zero. 
It is instructive to look at this procedure in the light of our 
geometrical picture from which the phrase “ rotating the 
factors ” is taken. It should be emphasized first of all 
that such rotation of the common-factor axes in Thur- 
stone’s system must take place entirely within the com- 
mon-factor space, and the common-factor axes must_not 
leave that space and encroach upon the specifics. In 
Figure 18, therefore, no rotation, in Thurstone’s sense, of 
the g axis can be made (since the common-factor space is a 
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line), except, indeed, reversing its direction and measuring 
stupidity instead of intelligence. 

In Figure 19 the common-factor space is a plane, and 
the axes a and b can be rotated in this plane, like the hands 
of a clock fixed permanently at right angles to one another. 
When the positive directions of a and b enclose all the 
vector projections, as they do in our figure, then all the 
loadings are positive. The position shown would, there- 
fore, fulfil this desire of Thurstone’s. Moreover, one of 
the loadings could be made zero, by rotating a and b until 
a coincides with line 1 (when 6 will have no loading in 
Test 1), or until b coincides with line 4 (when a will have 
no loading in Test 4). 

When there are three common factors, the common- 
factor space is an ordinary 8-space. The three common- 
factor axes divide this space into eight octants. Rotating 
them until all the loadings are positive means until all the 
projections of the test vectors are within the positive 
octant. This will always be nearly possible if the corre- 
lations are all positive. Moreover, it is clear that we can 
always make at any rate some loadings zero. In the 
common-factor 8-space we can move one of the axes until 
it is at right angles to two of the test projections, in which 
tests that factor will then have no loading. Keeping that 
axis fixed, we can then rotate the other two axes round it, 
seeking for a position where one of them is at right angles 
to some test. The number of zero loadings obtainable 
will clearly be limited unless the configuration of the test 
vectors happens to lend itself to many zeros. We shall see 
later that Thurstone seeks for teams of tests which do this. 

Although Thurstone makes his rotations exclusively 
within the common-factor space, keeping the specifies 
Xaya sacrosanct at their maximum variance, there is, of course, 

nothing to prevent anyone who does not hold his views 

from rotating the common-factor axes into a wider space, 

and increasing the number of common-factor axes at the 

v expense of the specific variance,until ultimately we reach as 
many common factors as we have tests, and no specifics. 

v (15. The geometrical picture of centroid analisis.— Think 
of a sheaf of lines representing a number of tests, with 
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angles corresponding to the correlations. Centroid analysis 
means (if unities are used in the diagonal cells) finding a 
line in the middle of this sheaf—at the centroid or resultant 
—something like the stick in the middle of the ribs of a 
slightly opened umbrella, except that our test lines are not 
regularly spaced like those ribs. 

All this is in a space of as 
many dimensions as there are 
tests, and it is not possible to 
make a drawing. But if the 
reader will be tolerant, we 
can make one of our“ super- 
perspective“ drawings show- 
ing a sheaf of test lines (see 
Figure 20) which must be 
imagined as being in a multi- 
dimensional space. The cen- 
troid line OC is the line along 
which the point O would move 
if each test line were a force 
—all equal—pulling O. It is 
exactly like the parallelogram 
of forces on a multi-dimen- 
sional scale. The dots on the 
test lines are at unit distance 
from 0. (They have been 
joined by lines only in order 
to make the figure look more 
solid.) The loadings of the Figure 20. 
tests in the first centroid 
factor are the projections of these unit distances on to OC— 
this is when unities are used in the diagonal cells. The 
summation process gives, arithmetically, these projected 
distances along OC. 

The next part of the arithmetical process consisted in 
removing that part of the correlation coefficients explained 
by the first factor loadings. This means, in our space 
diagram, that the dimension parallel to OC is abolished, 
and all the test lines are projected on to a space at right 
angles to OC and of one dimension less than the original, 
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(n — 1) dimensions instead of n, if n be the number of 
tests. 

We have had perforce to draw our diagram as though 
it were in a three-fold instead of an n-fold space: and for 
this new (n — 1)-fold space we have drawn an ordinary 
plane, like a drawing-board, at right angles to OC, and 
projected the five test lines on to it. The next thing is to 
find the centroid of these five directed lines, these vectors, 
on the drawing-board. But we find at once that they are 
in equilibrium, If they were forces, the point O would 
not move, That is because OC is indeed the centroid of 
the original lines. This fact of equilibrium corresponds to 
the fact that the columns of residues add up to zero. 

To get over this, in the arithmetic, we changed the signs 
of some rows and corresponding columns, till, if possible, 
all cells were positive. (These cells of the residues are the 
cosines of the angles on the drawing-board, some of which 
are clearly obtuse, with negative cosines.) This reversal of 
signs in the arithmetic corresponds, in our diagram, to 
reversing some of the vectors on the drawing-board, till 
they again form a sheaf, as close as possible. Two are 
shown as reversed in our figure, and most of the angles are 
now acute, most of the cosines positive. It is desirable to 
make the sheaf as compact as possible, corresponding to 
making as many cells positive as possible. 

The centroid of the resulting sheaf of vectors (or forces) 
is the second factor. Its dimension is next abolished, 
by projection on to a space of (n — 2) dimensions, and so 
on, and soon. Our possibility of following this in a draw- 
ing is beyond delineation, but if the reader will in imagina- 
tion conceive of our first sheaf of test lines being in n 
dimensions, and being step by step projected on to spaces 
of (n — 1), (n — 2) and lesser dimensions, he will have a 
picture corresponding to the arithmetical summation pro- 
cess and the sign reversals in the residues. 

For simplicity we have above supposed that unities 
were being left in the diagonal cells, in which case as many 
common factors would emerge as there were tests, and 
there would be no specifics. If communalities are inserted 
and the rank of the matrix of correlations reduced, there 
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will be fewer common factors. Our diagram would then 
be in the common-factor space and, indeed, can still serve, 
if we suppose the distances from O to the dots on the test 
lines to be not unity, but the square roots of the commun- 
alities, and the angles to be the projections of those between 
the full test lines. With that change, our diagram would 
be one for the communal parts of five tests with three 
common factors, represented by OC, by the resultant of 
the vectors on the drawing-board (after the reversals to 
destroy the equilibrium), and by a third line also on the 
drawing-board, at right angles again. ) 

16. Principal components.—The object of using centroids 
as axes in the above process is to obtain axes in diminishing 
order of importance as describers of the test lines. In the 
current jargon, they each “ take out ” as much variance as 
possible at each step—or rather, not quite as much as 
possible, though nearly so. There is another set of lines 
which actually do take out as much as possible. They are 
the lines corresponding to the axes of the ellipsoid of 
Figure 15, or the more general ellipsoids of higher dimen- 
sions. The centroid OC in our Figure 20 is in such a 
position that the sum of the squares of the vertical distances 
of the test dots to it is very small, nearly as small as 
possible. Another line, however, quite close to OC and 
corresponding to the major axis of the ellipsoid, makes this 
sum of squares an absolute minimum, and the sum of 
squares of the loadings of the factor a maximum. 


Soma 
Wad 


In Section 5 above we spoke of converting the ellipsoid 


of our Figure 15 into a sphere by swivelling the three test 
lines nearer to each other till the cosines of their angles 
correspond to the correlation coefficients, and the test lines 
take up positions such as they have in our Figure 20. 
When this is done, the major axis of the ellipsoid takes up 
a position among the test lines, quite near to the centroid 
but not quite coinciding, and with the property of maxi- 
mizing the “ variance taken out.”) Similarly, the other 
principal axes of the ellipsoid, wien the change is made in 
the space, replace for the better the later centroids of the 
simpler process. The arithmetical method of calculating 
their loadings is explained in our next chapter. 


CHAPTER VII 
‘PRINCIPAL COMPONENTS 


1. A historical accident.—By a historical accident, the 
method of principal components is associated in the minds 
of psychologists with analyses in which unities, and not 
communalities, are used in the diagonal cells of the square 
table of correlations. The centroid method can, however, 
equally well be used on such a table, giving the centroids of 
the complete test vectors in the whole test space: and the 
principal components of the communality vectors, in the 
common-factor space, can be found, using communalities in 
the diagonal cells, by the same iterative process as we are 
about to describe. As, however, this method was originally 
used on unit entries, we shall first make a principal com- 
ponents analysis of the whole tests of the example already 
used for the centroid process. Later we shall analyse the 
communality vectors by the same process (page 118). 

2. A calculation —The actual calculation of the loadings 
of principal components requires, for its complete under- 
standing, a grasp of the method of finding algebraically the 


1-0 4 4 2 8 78 775 
4 1-0 7 3 10 1-00 1-000 
4 7 10 3 10 1:00 1-000 
2 3 3 1-0 7 65 637 
80 32 82 16 
40 100 70 30 
40 70 100 30 ¢ 
“14, “21 “21 70 
1-74 2:23 2:23 1:46 
780 +312 312 156 
-400 1000 700 300 

88 400 700 1:000 300 
130 195 195 650 
1.710 2207 2207 1:406 
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principal axes of an ellipsoid, a problem which will be 
found dealt with in three dimensions in any text-book on 
solid geometry. We give an account of this, for n dimen- 
sions, in the Appendix. Here we shall only explain 
Hotelling’s (1983) ingenious iterative method of doing this 
arithmetically, by means of an example, for which we shall 
use the matrix of correlations already employed in Chapter 
V to illustrate the centroid method (see page 108). 

Hotelling’s arithmetical process then begins with a guess 
at the proportionate loadings of the first principal com- 
ponent. Practically any guess will do—a bad guess will 
only make the arithmetic longer. We have guessed °8, 1, 
1, 7, the numbers to be seen on the right of the matrix, 
because these numbers are roughly proportional to the 
sums of the four columns, and such numbers usually give 
a good first guess. 

Each row of the matrix is then multiplied by the guessed 
number on its right, giving the matrix below the first one, 
beginning with -80. We then take, as our second guess, 
numbers proportional to the sums of the columns of this 
matrix,* namely— 

1-74 2:23 2:23 1:46 
giving 78 4 1 65 
That is, we divide the sums of the columns by their largest 
member, and use the results as new multipliers. They 
are seen placed farther on the right of the original matrix. 
It is unusual for two of them to be of the same size that 
is a peculiarity of our example. 

It is always the original matrix whose rows are multiplied 
by each improved set of multipliers. The above set gives 
the next matrix shown, that beginning with -780, and the 
sums of its columns 


1.710 2-207 2-207 1:406 
give a third guess at the multipliers, namely— 
775 1 1 637 


* When a calculating machine is being used, this matrix will not 
be actually written down—the column sums will be arrived at on the 
machine. 
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And so the reiteration goes on, and the reader, who is 
advised to carry it a stage farther at least, would find if he 
persevered that the multipliers would change less and less. 
If he went on long enough, he would reach this point 
(usually, however, far fewer decimals are sufficient) : 


10 4 4 2 772865 

4 1-0 T 3 1-000000 

* zÀ 1-0 3B 1-000000 

v 2 3 3 1-0 629811 


772865 309146 309146 4154573 
400000 1-000000 700000 300000 
400000 700000 1000000 300000 
125962 188943 188943 629811 


1.698827 2198089 2.198089 1384384 
giving 772865 1 1 629813 


that is, totals in exactly the same proportion as the multi- 
pliers. These final multipliers (or earlier ones if the experi- 
menter is content with less exact values) are then propor- 
tionate to the loadings of the first principal component in 
the four tests. They have, however, to be reduced until 
the sum of their squares equals the largest total, 2-198089, 
which is called the first latent rogt of the original 
matrix. This is done by dividing them by the square root ? 
of the sum of their squares and multiplying them by the 
square root of the latent root. They then become— 
662 857 -857 540 

The next step in Hoire’ process is similar to one 
with which we have already become familiar in Thur- 
stone’s method. The parts of the variances and correla- 
tions due to this first component are calculated and sub- 
tracted from the original experimental matrix. These 
variances and correlations due to the. first component 
are shown at the top of the opposite page. 

The residual matrix is then treated in exactly the same 
way as the original matrix, the beginnings of the process 
being shown opposite. There is no need, in this process, for 
sign-changing. The guessed multipliers, proportional to 
the sums of the columns, are not so near the truth this 
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662 857 857 540 


662 439 567 567 357 


J 857 567 734 734. 462 Matrix due to 
857 567 734. 784 462 first principal 
540 357 462 462 291 component. 
| 561 — 167 — -167 — -157 3 18 
Residual | — 167 266 — 034 — 162 — 4 — 38 
matrix | — 167 — 034 266 — 162 — 4 — 38 
— 157 — 162 — 162 709 10 100 


— 157 — 162 — 162 709 


145 — 305 — 305 792 
time, for the first one, which we have guessed at -3, and 
which reduces after one operation to -18, goes on reducing 
until it becomes negative, the final values of these second 
loadings being as shown in the appropriate column of the 
following table, which also gives the loadings of the third 
and fourth factors, obtained in the same way. The vari- 
ances and correlations due to each factor in turn are 
subtracted from the preceding residual matrix and the new 
residual matrix analysed for the next factor : 


= A 3 
| Sum of 


| 
| j 
Factor | I | LE | TI | TK 
| | Squares 
ie 
Test 1 | 662218 — 323324 675967 1 
» 2 856836 Sa 135197 — 312332 — 387298 if 
75 8 856836 — 135197 — 312332 387298 af 
1 


„ 4. -539645 826092 162323 . 
Sum of | | | 
squares * 20198090 823526 | 678383 300000 | 4 


Percentages, 55:0 20-6 169 | 7˙⁵ 100 
| 


* These four quantities are, in the Hotelling process, what are 
called the “latent roots ” of the matrix. Their product gives the 
value, 3684, of the determinant of the matrix of correlation co- 


efficients. 
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An alternative method of finding principal compone 
due to Kelley, is to deal with the variables two at a 
The pair first chosen are rotated in their plane until the 
are uncorrelated. Then the same is done to another pail 
and so on, the new uncorrelated variables being in 
paired with others, until finally all correlations are 
(Kelley, 1935, Chapters I and VI.) A chief advantage 
that the components are obtained pari passu, and nol 
successively ; also, in certain circumstances where Hote 
ling's process converges very slowly, Kelley's is quicker 
The end-results are the same. 

84 Acceleration by powering the matriz.—In a later pape 
Hotelling pointed out that his process of finding the lom 
ings of the principal components can be much expedite 
by analysing, not the matrix of correlations’ itself, but it 
square, or fourth, eighth, or sixteenth power, got by 
repeated squaring \(Hotelling, 1985)). Squaring a Syme 
metrical matrix is u special case of matrix multi ion 
(see Chapter X, Section 4, page 145): it is done by findi 

“the “ inner products " (see footnote, page 74) of each pair 
rows, including each row with itself, and settin 
results down in order. Applying this to the 
matrix: 


0 4 4 2 
4 10 T 83 
4 7 10 B 
2 3 8 10 


we see that the inner product of the first row with itself 
is 1-86; of the first row with the second, 1-14; and sa 
on. Setting these down in order, we get for the matrix 


1-36 114 Lis 8 
114 174 1-65 -80 
ls 1-65 174 89 

64 89 80 1-22 


1 


„ats 1% 


2 


n pa it 


53 


1 0 19 . 
ee 
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l; 108-78 140-67 140-67 88-54 
140-67 182-038 182-03 114-61 
140-67 182-08 182-038 114-61 

| 88-54 114-61 114-61 72-38 


_and the square roots of its diagonal members are— 
10-429 13:492 18-492 8-508 
which are in the ratio— 
7730 1 1 6306 
very near indeed to the Hotelling final multipliers— 
772865 1 1 629811 


Hotelling gives a method of finding the residues, for the 


purpose of calculating the next factor loadings, from the . 


“ powered ” matrix. But it may be so nearly. perfectly 
hierarchical that this fails unless an enormous number of 
decimals have been retained, and it is in practice best to 
go back to the original matrix and obtain the residues 
from it. Their matrix can in turn be squared, and so on. 
Other and very powerful methods of acceleration will 
be found described in Aitken, 19370. 
y 4. Properties of the loadings.—If all the principal com- 
ponents are calculated accurately, and if unities were used 
in the diagonal cells, their loadings ought completely to 
exhaust the variance of each test ; that is, the sum of the 
squares of the loadings in each row should be unity. The 
sum of the squares of the loadings in each column equals 
the “latent root ” corresponding to that column, and the 
sum of the four latent roots is exactly equal to the number 
of tests. Each latent root represents the part of the whole 
variance of all the tests which has been “ taken out” by 
that factor. Thus the first factor “takes out” 55 per 
cent., the first two factors together 75-6 per cent., of the 
variance of the original scores. The four factors account 
for all the variance. . 

The correlations which correspond to the loadings given 
in the table on page 111 are obtained by finding the 
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inner product ” of each pair of rows. Applying this to 
the table we find the correlation ra, say, to be 


856886 X -539645 — 135197 X 826092 — 312332 
162323 — 387298 x zero = 300000 


In this way the loadings of the four principal com- 
ponents will exactly reproduce the correlations we began 
with. If, however, we have stopped the analysis after we 
have found only two principal components (or factors), 
these two would have reproduced the correlations only 
approximately. For example, for 14 we should only 
have— 


856836 X 539645 — -185197 X -826092 
= -850702 instead of -300000 


Before we leave the table of loadings, we may note that 
the signs of any column of the loadings can be reversed 
without changing either the variances or the correlations. 
Reversing the signs in a column merely means that we 
measure that factor from the opposite end, as we might 
rank people either for intelligence or stupidity and get the 
same order, but reversed. We will usually desire to call 
that direction of a factor positive which most conforms 
with the positive direction of the tests themselves, and 
therefore we will usually make the largest-loading in each 
column positive. 4 
All the loadings of the first principal factor are, in an 
ordinary set of tests, positive. Of the other loadings, 
about half are negative. 

Q v5. Calculation of a man’s principal components.— 
Factors obtained by using unities, and not communalities, 
in the diagonal cells have an important advantage. They 
can be calculated exactly from a man’s scores, whereas 
communality factors can only be estimated. This is 
because the former are never more numerous than the tests, 
whereas the latter, including the specifics, are always more 
numerous than the tests. For the former, therefore, we 
always have just the same number of equations as un- 
knowns, whereas we have more unknowns than equations 
when communalities are used. 

We have hitherto given the analysis of tests into factors 
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in the form of tables of loadings. But we can alternatively 


55 


write them out as “ specification equations,” as we shall 
callthem. Thus the table on page 111 would be written— 


21 = 662218), — 823324% 6759673 
Za = 8568367, — 1851977, — -312832y, — 887298. 
% = ‘8568367, — 135197 — 312332 8872985. 
2, = 589645y, + 826092% + 1623237 


Here 2, 22, %3, and z, stand for the scores in the four 
tests, measured in standard units ; that is, measured from 
the mean in units of standard deviation. The factors 
Yu Y» Ys and y, are also supposed to be measured in such 
units. These specification equations enable us to calculate 
any man’s standard score in each test if we know his 
factors, and since there are just as many equations as 
factors, they can be solved for the y’s and enable us to 
calculate, conversely, any man’s factors if we know his 
scores in the tests. The solution to these Hotelling equa- 
tions for the y’s happens to be peculiarly simple, as we 
shall prove in the Appendix, Section 7. It is as follows— 


71 = ( 6682182, + -856886z, + -856836z, + -589645z,) + 2-198090 
ya = (— 32332421 — 13519722 — 13519725 + -826092z,) — 823526 
ya (67596721 — -312882z, — 8123322 + -162328z,) + -678388 
% = ( ` — +887298z, + ‘3872982; ) + 300000 


The table on page 111, therefore, serves a double purpose. 
Read horizontally it gives the composition of each test in 
terms of factors. Read vertically it gives the composition 
of each factor in terms of tests, if we divide the result by 
the root at the foot of the column.* 

Suppose, for example, that a man or child has the fol- 
lowing scores in the four tests— 


1-29 36 72 1:08 


This is evidently a person above the average in each test, 
since the scores are all positive. His factors will be 


* If the analysis has been performed with “ reliabilities ” in the 
diagonal cells instead of units, the statement in the text still holds 
(Hotelling, 1933, 498). If on correlations corrected for “ attenua- 
tion,” the matter is more complicated (ibid. 499-502). 
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obtained by substituting these scores for the 2’s in the 
above equations, with the result— 


yı = 1:062504 
Ya 849441 
Ys = 1-034624 
ya 464757 


(Of course, in practical work six decimal places would be 
absurd. They are given here because we are using this 
artificial example to illustrate theoretical points, in place 
of doing algebraic transformations, and they need, there- 
fore, to be exact.) 

If these values for the factors are now inserted in the 
specification equations opposite, the scores 2 in the test 
will be reproduced exactly (1-29, -36, -72, and 1-08). 

Notice, too, that if we have stopped our analysis at less 
than the full number of principal components using unities 
in the diagonal cells, we can nevertheless calculate these 
factors for any person exactly. As soon as we have the 
first column of the table on page 111, we can calculate y; for 
anyone whose scores z we know. 

Had we done this with the person whose scores are given 
above, we should have summarized his ability in these four 
tests by the one statement— 

yı = 1-062504 


This would have been an incomplete statement, but it 
is the best single statement that can be arrived at. 
6. Principal components in the common-factor space.— 
faci the same iterative Hotelling process for finding the 
principal components, the principal axes, of the ellipsoids 
of density of the person-points can be applied to the table 
of correlations with communalities in the diagonal cells, 
The ellipsoidal swarm of person-points, in the full test space 
with. orthogonal axes for the tests, remains an ellipsoidal 
swarm (though one of fewer dimensions) when projected 
on to the common-factor space. The mathematical reader 
will know this, or can work it out. The non-mathematical 
reader knows it well enough in the number of dimensions 
he is personally acquainted with : e.g. an egg, which is an 
ellipsoid of three dimensions, throws a shadow on a wall 
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which is an ellipse, i.e. an ellipsoid of two dimensions. We 
shall now analyse the same set of correlation coefficients 
using the communalities 26, 7, 7, 15, which we know, 
from Chapter V, page 77, reduce the rank of the matrix 
to two, and give an analysis with only two common factors, 
We found on page 78 the two centroid common factors. 
We shall now find the two principal cgmponents and find 
them very similar. 
7. Calculation with communalities 


2667 4 4 2 7 359 5913 
4 N 8 i | 1 
“4 “lee 8 E . 1 
2 3 I 8 45 4435 


1:0867 1:83 1-83 815 


Taking 7, 1, 1, -5 as a first guess at the multipliers, we find 
the weighted sums of the columns to be as shown, and on 
dividing through by 1-83 we get the next set of multipliers 
-59, 1, 1, -45. Continuing in this way, we arrive quite 
soon at -5913, 1, 1, 4435, which, when used as weights, 
reproduce themselves. When reduced until the sum of 
their squares equals 1-7696 (the largest column total with 
these weights), the loadings are— 


4929 8336 8336 3697 


Subtracting the cross-products of these from the original 
matrix, and operating on the residues in exactly the same 
iterative way, we get for the second factor loadings— 


1540 — 0712 — 0712 1153, and no residues.“ 


If we compare these principal component loadings with the 
centroid loadings (page 78) obtained with the same com- 
munalities, we see that they are very similar. But the 
sum of squares of the loadings of the first principal com- 
ponent (1-7694) is slightly larger than the same sum for the 
first centroid loadings (1.7652). The principal compon- 


*The sums of squares of the loadings (1:7694 and -0471) are the 
two first latent roots of the matrix with communalities. The other 
two latent roots are zero. The sum of the latent roots equals the 
sum of the communalities, the “ trace ” of the matrix as it is called. 
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ents take out at each stage the maximum possible variance 
(sum of squares of loadings). The centroids nearly do 
so if the sign-changing is carefully done, but not quite. 
The centroids can best be looked at as approximations to the 
principal components, more easily calculated. In a bat- 
tery of many tests, say two dozen, and with any given 
communalities, the principal component process (“ weighted 
summation ”) will take out more variance in, say, six 
factors, and leave smaller residues, than will centroid 
factors.* But with the kind of data available in psychology, 
this advantage does not outweigh the disadvantage of 
longer calculation. 

Q /8. Iterative methods.—Both in the above Hotelling cal- 
culation, and in our discussion of communalities on page 
88, we have seen examples of iterative processes, where a 
first guess at certain constants gives results which can be 
used as a better guess, which gives results which can be 
used as a still better guess, which gives . . . and so on 
and so on, until the stage is reached where the same con- 
stants emerge as were put in. This sort of process, where 
repetition after repetition converges to a steady result 
giving some maximum or minimum value to some quantity, 
is not uncommon in mathematics and is rather mysterious 
and magical to the layman. An analogy will perhaps 
assist understanding. Robinson Crusoe wants to make a 
lathe, but he has no wheels and spindles, and to make 
wheels and spindles he needs a lathe! He can, however, 
whittle crude makeshift wooden wheels, ete., with a knife, 
and make a crude lathe with them, with which lathe he can 
make somewhat better wheels and therefore a somewhat 
better lathe, with which he can make still better wheels 

. and so on, till he reaches perfection. 


* If the Hotelling process is used with guessed communalities, and 
the whole is iterated (as was done with centroids on page 88) the 
communalities will converge to a set minimizing the sum of squares 
of the residuals for a given number of factors. The maximum likeli- 
hood method of Chapter IX arrives at communalities (I understand 
from Dr. Lawley) which minimize a weighted sum of squares of the 
residuals, each weight being the product of the reciprocals of the two 
specific variances concerned. 


CHAPTER VIII 
TESTING RESIDUES FOR SIGNIFICANCE 


I. The object of factorial analysis.—As was said in the first 
section of Chapter I, the objects of factorial analysis are 
both practical and theoretical. The practical desire is to 
reduce the description of a man’s mind* to a comparatively 
few quantitative statements, instead of an unwieldy record 
of innumerable test scores, with a view to giving vocational 
or educational advice. The hope, on the theoretical side, 
is that the “ factors found may form the structure of a 
theory of mind : and there are some who hope that physio- 
logical or neurological bases may be found for them. Our 
concern in this chapter is with the first point: how to 
reduce the number of “ factors” without sacrificing any 
significant fraction of the information. The insertion of 
communalities in the diagonal cells of a table of correla- 
tions is by many looked upon as one way of doing this, 
since it reduces the number of common factors. Simul- 
taneously, however, it creates and maximizes the influence 
ascribed to specific factors, and the total number of factors 
is increased, not diminished. This will not be discussed 
in the present chapter, which is concerned with another 
way of reducing the number of factors, applicable whether 
communalities or full variances are analysed. If the idea 
of communalities and specifies had never occurred to any- 
one, it would still have been possible to reduce the number 
of significant common factors to a number less than the 
number of tests. Each principal component, found as 
described in Chapter VII, causes the remaining residues to 
be as small as can be: and the centroid factors of Chapter 
V are nearly as good, if the sign-changing is done properly. 
If, after a few such factors have been extracted, the 
residues are so small as to be statistically negligible, we 

Or of other objects of study, say in agriculture or in engineering. 


See Chapter XII, Section 7. 
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might as well stop the analysis, content with the few factors 
extracted. We need, therefore, some test of statistical 
significance, applicable to such residual correlations, to 
know if they are negligible. 

2. The general idea of significance—The general prin- 
ciple of such a test of significance is this, that if the residues 
we have found, or in practice some function of them, could 
only rarely have been produced by the action of chance 
sampling, we will assume that they are not due to sampling 
but to another factor. How we define “ rarely ” depends 
on circumstances. Usually in psychology “ once in twenty 
times ” (the 5 per cent. point as it is called) is rare enough 
to justify taking out another factor. The principle is 
straightforward enough, the mathematical difficulty of 
finding formule for calculating the chances, however, very 
great, even for principal components with full variances, 
and insuperable when the centroid method is used with 
guessed communalities. In consequence, a number of 
rule-of-thumb criteria have been put forward, to decide 
when to stop factorizing. 

3. Empirical rules for the number of factors.—Thurstone 
(19384, 65 et seq.) discusses some of the earlier ones. A cri- 
terion which appeals to common sense is based simply on the 
algebraic sum of the residuals (excluding the diagonal cells) 
after as many as possible of their signs have been made 
positive by the process described in Chapter V (page 71). 
As long as this sum goes on sinking, factorization is con- 
tinued.. When it flattens, the last factor taken out is 
rejected and the process stopped. Mosier (1939) found this 
the best of five plans he tried, though none was wholly 
satisfactory. 

Ledyard Tucker’s criterion is that the ratio of the sums 
of the absolute values of the residuals, including the 
diagonal used, just after and just before the extraction of 
a factor must be less than (n — 1)/(n + 1) where n is the 
number of tests. 

Coombs’ criterion depends upon the number of negative 
signs left among the residuals after everything has been 
done to reduce them by sign-changing, in the centroid 
process. If they are few, another factor may be extracted. 
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More exactly, the permissible number is given in this 
table : 


Number of tests 15 20. 25 mae 
Negative signs . - 81 79 149 242 358 
Standard error. 10 ¶ 12 


A fuller table is given in Coombs’ article (1941). 

‘An example of the use of these two will be found in 
Blakey (1940, 126). 

Quinn MeNemar (1942), who considers both of the 
above inadequate, gives a formula which includes N the 
size of the sample. He takes out factors until oi reaches 
or falls below 1/ VV, where 


1 = , , (1 — My), 
s = st. dev. of the residuals after s factors, 
M = mean communality for s factors. 


q 


o] 


Others go on until the distribution of the residuals 
ceases to be significantly skew (Swineford, 1941, 378). 
Reyburn and Taylor (1939) divide the residuals by the 
probable errors of the .original coefficients, and plot a 
distribution of the results disregarding signs. If it is 
significantly different from a normal curve of the same area 
and with standard deviation 1-4825, they take out more 
factors. Swineford (1941, 377) finds the correlation 
between the original correlations and the corresponding 
residuals and takes out factors till it is not significant. 

Another method is based on the sinking of the factor 
loadings with each successive factor instead of on the dying 
away of the residuals. Guilford and Lacey (1947 in a 
U.S. Air Force report) stop factorizing when the product 
of the two highest factor-loadings falls below 1/ VN. 

P. E. Vernon, in a privately circulated manuscript, has 
tested some two dozen methods, as applied when the 
centroid or simple summation method of analysis is used 
with communalities, on two analyses of actual data, on 
645 and 994 cases respectively (Vernon, 1947). His final 
advice is to use the methods of Guilford and Lacey (pro- 
duct of the two highest factor loadings) and of Mosier 
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(sum of the residuals), together with Burt’s empirical 
formula for the standard error of each factor loading— 


a = Un 


VNG =e) 


where ] = loading, N = number of persons, n = number of 
tests, s = the ordinal number of the factor. If half the 
loadings of a factor fall below twice their standard errors 
thus found, Vernon recommends rejection of the factor. 

If these three methods do not agree, Vernon would 
proceed to calculate McNemar’s o, (opposite), and would 
decide on the evidence of the four criteria, taking out another 
factor if doubtful. 

4. More exact methods. The earliest method was to 
compare each residue with the standard error of the origin 
correlation coefficient and cease factorizing when the 
residues all sank below twice these standard errors. But 
the use of the formula for the standard error of r is now 
frowned upon because of the skewness of the distri- 
bution. 

Moreover, sampling errors in the correlation coefficients, 
being themselves correlated, produce further factors ; and 
the above-mentioned test tended to stop the analysis too 
soon (Wilson and Worcester, 1989). These further factors 
must be taken out in order to give elbow room for rotation 
of the axes to some psychologically significant position. 
For the error factors are not concentrated in the last 
factors taken out, but have been entangled with all. 
Usually more factors have to be taken out than can be 
expected, on rotation, to yield meaningful psychological 
factors, but all the dimensions are required nevertheless for 
the rotations. In geometrical terms, some of the dimen- 
sions of the common factor space will be due to sampling 
error, but not the particular dimensions indicated by the 
directions of the last factors to be extracted. In terms of 
Hotelling’s plan, the whole ellipsoid is distorted ; its small 
major axes are not necessarily due entirely to sampling, nor 
its large ones free from it. Az’ method is described by Wilson 
and Worcester (1939, 189) which is, however, laborious 
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when the number of tests is large. See also Burt (1940, 
338-40). Lawley (1940, 76 et seq.) repeated Wilson and 
Worcester’s criticism and developed an accurate criterion 
described in the next chapter. This is probably the best 
plan to use in any research where great accuracy is necessary. 
And it is for the case where communalities are employed. 
It is, however, only legitimate when the factor loadings 
have been found by Lawley’s application of the method of 
maximum likelihood. 

Principal components lend themselves to exact treat- 
ment when full unities are used, i.e. there are no specifics 
assumed. Hotelling himself (1933, 437-41) discusses the 
matter of the number which are significant. Davis (1945) 
shows how to find the reliability of each principal compon- 
ent from the reliabilities of the tests, and finds that it may 
happen that a later component is more reliable than an 

rlier one. 

5. M. S. Bartlett's test of significance for principal com- 
ponents.—Recently (Bartlett, 1950) a method has been 
described for deciding the significance of principal com- 
ponent factors which, while it is unlikely, in its present 
form at least, to be usable in any ordinary cases, ought 
to be briefly described here. It is highly desirable that 
exact methods, or methods where the assumptions made 
and the approximations permitted are clearly realized and 
set out, should gradually replace those based on experience 
only. Bartlett’s method depends upon the latent roots of 
the matrix of correlation coefficients with unity in each 
diagonal cell—it is not applicable to communalities. 

Latent roots have been mentioned on page 111, where 
they appear as the sums of squares of the loadings of the 
tests in each principal component. In the example there 
used, their values are— 


dy = 2/198 
ty 824 
Ag = 678 
ry = ‘800 


They are equal in number to the tests, and their sum also 
is exactly 4. Bartlett forms quantities R; as follows : 
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1 
HI = N X= =1 log, R 
M 
A ( F 7 
= AgAyl —— = 8506 — 0: 
e = AAR 0.16182 
R ( > ) 
= NA. — = 7734 — 0˙25696 
l N 
Ry = ,L = 3684 — 0.99858 


and of these we require the natural logarithms, which are 
2-3026 times the usual logarithms to the base ten. They 
are given above. These logarithms, multiplied by a certain 
coefficient, are an approximation to y? for the successive 
factors. The coefficient is 


where n is the number of persons tested less one, p is the 
number of latent roots, i.e. of tests, and k is the number of 
factors already dealt with, i.e. it takes in turn the values 
0,7 15 

In our example p = 4. If we assume that the number of 
persons tested was 20, so that Bartlett’s n = 19, we can 
make this table : 


per cent. 


5 
y 1 
* af. x | level 
0 3 +2+1 | — 16:833 x (— 99858) = 16:8095 | 12:59 
1 271 — 16-167 x (— 25696) = 41542 | 7:82 
2 | 1 | — 15-500 x (— 16182) = 2:5082 | 3-84 
| | 


The quantities in the last column are to be obtained from 
a y? table, entered with the number of degrees of freedom 
(d.f.) shown. Only the first factor is significant (16-8095 
being greater than 12-59). 

If we had assumed 29 children (n = 28) we should have 
been puzzled by a peculiar result. The three values of y? 
are then 25-80, 6-47, and 8:96, so that it looks as though the 
first factor and the third factor are significant, with the 
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factor in between not significant!* But Bartlett warns 
(1950, 78) that this y* test is only valid if the roots 
already removed are significant. As soon as we come 
to a non-significant factor, the later factors are also non- 
significant. The last factor of all is not dealt with. 
“ Merely the correlation structure of the variables is being 
investigated in its relation to variance,“ says Bartlett 
(page 80). For this reason no significance can ever be 
attached to the last root, for it would be equivalent to 
asking for the correlation structure of a single variable.” t 


* Compare the report by Davis (1945) that a later component may 
be more reliable than an earlier one. 

+ In a later paper (B. J. P. Statist. 4, p. 1) Bartlett warns that 
after one or more significant components have been eliminated it is 
safer to take as the number of degrees of freedom 

+ (p—k—1) (p—k+2) 
instead of 
ł (p—k) (p—k—1) 
as used above. This would increase the degrees of freedom in the 
second line of the analysis on page 125 from 8 to 5, and in the 
third line from 1 to 2, and raise the 5 per cent. level. 


CHAPTER IX 


THE MAXIMUM LIKELIHOOD METHOD OF 
ESTIMATING FACTOR LOADINGS * 


(by D. N. Lawley) 


1. Basis of statistical estimation.—In recent times attempts 
have been made to introduce into factorial analysis statis- 
tical methods developed in other fields of research. In 
particular the method of statistical estimation put forward 
by Fisher (1921, page 828 et seq.), and termed the method of 
maximum likelihood, has been applied by Lawley (1940, 
1941, 1948) to the problem of estimating factor loadings. 
This method has the property of using the largest amount 
of available information contained in the data and gives 
„efficient“ estimates, where such exist, of all unknown 
parameters, i.e. estimates which, roughly speaking, are on 
the average nearer the true values than those obtained by 
other, “ inefficient,” methods of estimation. 

Before using the maximum likelihood method for esti- 
mating factor loadings it is necessary to make certain 
initial assumptions. We assume that both the test scores 
and the factors, of which they are linear functions, are 
normally distributed throughout the population of indi- 
viduals to be tested. This assumption of normality has 
been the subject of some criticism, but in practice it would 
appear that departure from strict normality of distribution 
is not very serious. It is also necessary to make some 
hypothesis concerning the number of general factors 
which are present in addition to specifies. We shall later 
on show how this hypothesis may be tested, and how it 
may be determined whether the number assumed is, in fact, 


sufficient to account for the data. A 
2. A numerical example In order to illustrate the calcu- 


*For a detailed exposition of the arithmetical procedure of 
Lawley’s method, with checks, see Emmett (1949). 
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lations needed we shall reproduce an example used by 
Lawley (19430), where eight tests were given to 448 indi- 
viduals. The table below gives the correlations between 
the eight tests, unities having been placed in the diagonal 
cells. In this example the hypothesis made is that two 
general factors, together with specifics, are sufficient to 
account for the observed correlations. 


1 2 3 4 5 6 7 8 
1 1:000 312 405 457 -500 -350 521 564 
2 | 312 1000 -460 316 -279 -173 339 288 
3 405 460 1:000 394 380 258 433 823 
4 457 316 894 1-000 460 222 -516 486 
5 500 279 380 460 1-000 239 441 417 
6 350 173 258 222 239 1:000 -302 262 
7 521 339 433 516 441 302 1-000 547 
8 564 288 323 „486 417 262 547 1:000 


The method of estimation about to be described is one 
of successive approximations. Each successive step in the 
calculations gives a set of factor loadings which are nearer 
to the final values than those of the previous set. To 
start the process it is only necessary to guess or to find by 
some means (e.g. by a centroid analysis) first approxima- 
tions to the factor loadings. Any set of figures within 
reason will serve the purpose, though, of course, the better 
the approximation the fewer steps in the calculation will 
be needed. For illustration we shall take as first approxi- 
mations to the factor loadings the set of values given below : 


Tests 
Trial — — —— u—y:p— 
loading in 1 2 3 4 5 6 7 8 
Factor I 73 50 66 66 62 40 73 70 
Factor II 17 — 27 — · 47 08 06 »02 1⁰ 29 
Specific 
variance 4382 -6771 -B435 +5580 -6120 -8396 4571 4259 


Under the loadings are written the corresponding first 
approximations to the specific variances (the total variance 
of each test being taken to be unity). They are as usual 
found by subtracting from unity the sums of squares of 
the loadings for each test. 
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The calculations necessary for obtaining second approxi- 
mations to the loadings in factor I may now be set out as 
follows : 


(a) 1.666 +788 1-921 1:183 1-018 -476 1-597 1-644 

(b) 5647 3.895 5-182 5-129 4880 3-100 5:647 5-412 

(e) 4.917 8895 LATZ 4-469 4-210 2:700 4-917 4-712 
h? = 45-724 1 /h, = 014789 

(d) 7 502 +661 661 623 399 :727 607 


The first row of figures, row (a), is found by dividing the 
trial loadings in factor I by the corresponding specific 
variances. The figures in row (b) are then given by the 
inner products (see footnote, page 74) of row (a) with the 
successive rows (or columns) of the correlation table 
printed above, and row (e) is obtained by subtracting 
from the figures in row (b) the corresponding loadings in 
factor I. The quantity h? is given by the inner product 
of rows (a) and (c), and hence, taking the square root of the 
reciprocal of this quantity, we find I/. Finally, row (d) 
is obtained by multiplying the figures in row (e) by 1/ħ;, 
or 14789. The resulting numbers are then second 
approximations to the loadings of the tests in factor I. 

The most direct way of obtaining second approximations 
to the loadings in factor II is to find the residual matrix 
which results from removing the effect of factor I, and to 
treat it in the same way as the original matrix, using this 
time the trial loadings in factor II. A less direct but con- 
siderably shorter method may, however, be obtained by using 
once more the original matrix and modifying the process 
slightly. The necessary calculations are as shown below: 


(e) 888 —.399 —1-368 143 -098 024 +219 681 
(f) 330 —.560 —-980 150 -113 038 190 580 


pı = — 0234 
8). SLT 278 —.495 -085 068 027 107 306 
k? = 1:1080 1/k, = +9500 


(h) 168 —-264 —-470 081 065 026 102 201 


Row (e) is found by dividing the trial loadings in factor II 
by the corresponding specific variances (thus, -388 is 
17/4882), while the numbers in row (f) are given by the 
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inner products of row (e) with the rows of the correlation 
table. 

The step by which row (g) is obtained from row (f) is 
a little more complicated than the corresponding step in 
the calculations for the first-factor loadings. From each 
number in row (f) we subtract not only the corresponding 
trial loading in factor II, but also a correction which 
eliminates the effect of factor I; this correction consists 
of the corresponding number in row (d) multiplied by 
— 0234, the inner product of rows (e) and (d). Thus, for 
example, the number -177 in row (g) is equal to 

‘830 — -170 — -727 x (— 0234) 
In general, where more than two factors are assumed to be 
present and where further approximations are being calcu- 
lated for the loadings in the rth factor, there will be (r — 1) 
such corrections to be subtracted, one for each of the 
preceding factors. 

Having found row (g) the quantity A is now given by the 
inner product of rows (e) and (g), from which, taking the 
square root of the reciprocal, we derive 1/k,. Row (h) 
is then obtained by multiplying the figures in row (g) by 
1/ky, or -9500. We have thus found second approximations 
to the loadings in factor II. 

The whole cycle of calculations may now be repeated 
over and over again until the required degree of accuracy 
is reached. In practice, provided that the initial trial 
loadings are not too far out, one repetition of the process 
will usually be found sufficient. In our example the final 
estimates (with possible slight errors in the last decimal 
place) were as follows : 


Tests 
Loading in 1 2 3 4 5 6 7 8 
Factor I 725 503 -664 -66l -623 -399 -726 694 
Factor II 172 —-261 —-468 -087 -069 -027 -106 291 


Specific 
variance 445 679 -340 -556 -607 -840 462 434 


Having obtained these figures, there is, of course, no 
objection to rotating the factors as desired in order to 
reach a psychologically acceptable position. 
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3. Testing significance.—A difficulty in most systems of 
factorial analysis is to know how many factors it is worth- 
while to“ take out,” and to decide how many of them may 
be considered significant. From a statistical point of 
view objections can be raised against the majority of 
methods at present in use for this purpose. When, how- 
ever, the number of individuals tested is fairly large, the 
maximum likelihood method provides a satisfactory means 
of testing whether the factors fitted can be considered 
sufficient to account for the data. 

To illustrate this let us return to the example of the 
previous section. It is first of all necessary to calculate 
the matrix of residuals obtained when the effect of both 
factors is removed from the original correlation matrix. 
For this purpose we use the final estimates of the loadings 
as already given. The residual matrix, with the specific 
variances inserted in the diagonal cells, is as follows : 


1 2 3 4 5 6 7 8 


| 
1| (445) —-008 004 —:037 036 056 —-024 011 
2 —008 (-679) 004 -006 —-016 —-021 001 015 


3 004 004 (340) —-004 —-001 ·006 001 —:002 
4| —037 006 —-004 (556) -042 —.044 »027 002 
5 036 —.010 —.001 042 (607) —.011 —-019 —-085 
6 056 —.021 006 —-044 —-011 (840) „009 —-023 
7 —.024 -001 001 027 —-019 009 (462) 012 
8 011 -015 —.002 002 —-035 —-023 012 (434) 


We are now able to calculate a criterion, which we shall 
denote by w, for deciding whether the hypothesis that only 
two general factors are present should be accepted or 
rejected. Each of the above residuals is squared and 
divided by the product of the numbers in the corresponding 
diagonal cells. Thus, for example, the residual for 
Tests 4 and 7 is squared and divided by the product of 
the fourth and seventh diagonal elements, giving the result 


__ (027) 002888 
556 X 462 


There are altogether 28 such terms, one for each residual, 
and w is obtained by forming the sum of these terms and 
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multiplying it by 448, the number in the sample. The 
result is found to be 20-1. 

When the number in the sample is fairly large w is 
distributed approximately as 7? with degrees of freedom 
given by 

Hen — m) — n — m} 


where n is the number of tests and m is the assumed num- 
ber of factors. To test whether the above value of w is 
significant we now use a y? table such as is given by 
Fisher and Yates (1938, page 27). In our case, putting 
n = 8 and m = 2, the number of degrees of freedom is 13. 
Entering the y? table with 13 degrees of freedom, we find 
that the 1 per cent. significance level is 27-7. This means 

that if our hypothesis that only two general factors are 
present is correct, then the chance of getting a value of w 
greater than 27-7 is only 1 in 100. Tf, therefore, we had 
obtained a value of w greater than 27:7 we should have 
been justified in rejecting the above hypothesis and in 
assuming the existence of more than two general factors. 
In our case, however, the value of w is only 20-1, well below 
the 1 per cent. significance level. We have thus no 
grounds for rejection, and although we cannot state that 
only two general factors are present, we have no reason to 
assume the existence of more than two. 

It must be emphasized that the method described above 
is not applicable if other, inefficient, estimates of the 
loadings are substituted for the maximum likelihood 
estimates. For the value of y? would in that case be 
greatly exaggerated, causing us to over-estimate its 
significance. For this reason we cannot, for example, 
use the method for testing the significance of the re- 
siduals left when factors have been fitted by the centroid 
method. 

4. The standard errors of individual residuals.—A. method 
has now* been developed for finding the standard errors 
of individual residuals. This should be useful when a few 
of the residuals are very large, while the rest are small. 
In such a case one or more of the residuals may be highly 


* Lawley in the Proc. Roy. Soc., Edin., 1949. 
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significant, when tested individually, even though the 
value of y? does not attain significance. The method 
ignores errors of estimation of the specific variances, which 
are not, however, likely to be very large provided that the 
number of tests in the battery is not too small. 

Let us denote by li m the estimated loadings of the a” 
test in the first and second factors respectively (assuming 
the existence of only two factors). Let v; be the specific 
variance of the i” test, and let us write— 


Then the standard error of the residual for the i” and j” 
tests (i + j) is given by— 


h „„ 
Where en = U „ y 
h k 

il, mum; 

and 5 T — 5 j 


This formula may, of course, be easily extended to take 


into account any number of factors. 
Let us illustrate the use of the above formula with the 

same numerical example as before. Tf we wish to test the 
significance of the residual for the: first and fourth tests 
after removing two factors, we have— 

L= 725 m = 172 v = 44479 

4 661 my e a 55551 

h 6:7185 k = 1:0528 
Hence en = 38845 % 48329 ei = — 08554 


and ee (e e ) = 0196 
443 11°44 14 


Thus the residual in question has a value of -037 with a 
standard error of -020. It is clearly not significant. 


Il 
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5. The standard errors of factor loadings. When maxi- 
mum likelihood estimation has been used, we are able to 
find the standard errors of not only the residuals but also 
the estimated factor loadings. Using the same notation as 
in the preceding section, the sampling variance of l, the 
loading of the i" test in the first factor is (assuming the test 
to be standardized)— 


TORE 


and the standard error is the square root of this. 
The covariance between any two first factor loadings l; 
and J; is given by 


TE 


The formulæ for the variances and covariances of the 
subsequent factor loadings are more complex. Thus the 
variance of m; the loading of the i” test in the second 
factor, is— 


W Ce ese 


while the covariance between m, and my is 


1 1 1 i 
v0 a z) frs = (a Æ 0 Ul, — (1 = 0 


The results for the general case, where more than two 
factors have been assumed present, may be written down 
without difficulty. Each factor will give rise to one more 
term within the curly brackets than the preceding factor. 
It should be noted that the last of such terms, and that 
alone, is multiplied by 4. 

The variances and covariances of loadings in any factor 
are those for given values of the loadings in all preceding 
factors. 

It must be stressed that all the above results are applic- 
able only to the unrotated loadings. 

In our numerical example, we find— 
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1 
1+ ee 1:14884 


1 
1+ — =1-9498 
+ k 


Hence the variance of l}, for example, is 


1:14884 
ae fı — $ X 114884 x 125. = 001810 


while that of m, is— 


2 449 fı —1:14884 x -725?—} X 1:9498 X are} = 001617 
Thus the loading of test 1 in the first factor is 725, with a 
standard error of 

V-001810 = 043 
and its loading in the second factor is -172, with a standard 
error of — 

/-001617 = -040 


6. Advantages and disadvantages.—To sum up: the 
chief advantage of the maximum likelihood method of 
estimating factor loadings is that it does lead to efficient 
estimates and does provide a means of deciding how many 
factors may be considered necessary. It unfortunately 
takes, however, much longer to perform than a centroid 
analysis, particularly when the battery of tests is a large 
one and when several factors are to be fitted. The chief 
labour of the process lies in the calculation of the various 
inner products ; although in this respect it does not differ 
greatly from Hotelling’s method of finding “ principal 
components.” The maximum likelihood method is thus 
likely to be most useful in cases where accurate estimation 
is desirable and where it is proposed to make a test of 
significance. 

The method also possesses the advantage of being 
independent of the units in which the test scores are 
measured. The same system of factors is therefore 
obtained whether the correlation or the covariance matrix 
is analysed. The loadings in the one case are directly 
proportional to those in the other, 


CHAPTER X 
Z THE ROTATION OF FACTORS 


1. Is rotation necessary ?—The factors or axes arrived at 
by the centroid process (or as principal components) are 
not at all the same sort of things as the Spearman system 
and its extensions gave. The Spearman factors, though 
mathematical devices are used in calculating their loadings, 
have psychological meaning from the first. Their names 
indicate this—general intelligence, the verbal factor, ete. 
There is no need for rotating them. 

With the other kind of factor, the case is different. As 
first obtained, they make no claim to have psychological 
meaning. Their virtue is a purely mathematical virtue— 
they each explain, in turn, as much as possible of the vari- 
ance of the tests, and arrive with as few common factors 
as possible at negligible residues. The-loadings of the 
first centroid* factor are usually all positive, and it runs as 
a positive factor through all the tests. But it is not as a 
rule identical with Spearman’s g. The succeeding cen- 
troid factors have each negative loadings in about half the 
tests, and are often referred to as bipolar factors. They 
may be looked upon as repeatedly classifying the tests into 
subgroups, and this classification may be expressed by a 
kind of family tree : 


7 


Factor I All loadings positive 
a= a 2 
Factor II Positive loadings Negative loadings 
| | 


a mea] 
Positive Negative Positive Positive 
loadings loadings loadings loadings 

Not infrequently the sub-families into which this bipolar 
classification analyses the tests will have something psycho- 
* This is the most convenient name, to avoid verbosity. But 
unless it is otherwise stated, may it be understood that principal 


components are equally referred to. 
139 


Factor III 


J 
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logical in common, and to that extent these factors in such 
cases may claim to have psychological meaning. Much 
depends on how the battery of tests is made up. And 
such bipolar classification is more natural in tests of tem- 
perament and character, where common speech has many 
bipolar phrases (as brave-cowardly, modest-cheeky, cte.), 
than in tests of an intellectual nature, though there too 
bipolar pairs of words are found, like clever-stupid. 

Many psychologists, however, especially if they tend to 
look upon factors as real mental entities, even perhaps with 
physiological causes, find it difficult to admit all those 
negative loadings. A mental ability or factor, they argue, 
is on the whole something which helps us to do things, not 
hinders. A few negative loadings they can understand : 
but not so many as half the loadings. So they wish to 
turn the centroid axes into positions where most of the 
loadings will be positive, and moreoyer positions to which 
they can give psychological meaning, and which will be 
found and be recognizable in different batteries of tests. 
For this purpose the factor-analyst must be instructed in 
methods of rotating the centroid factors into new positions. 

Q 2. Methods of rotation—One method, Alexander’s, has 
already been described earlier in this book on pages 79 
to 80. It was used by Alexander himself with excellent 
effect (Alexander, 1935), but involves assuming (a) that the 
communality of a certain test is entirely due to one factor ; 
(b) that the commiunality of a second test is entirely due 
to this factor and one other; (e) and so on for r — 1 tests, 
where r is the number of factors. The criterion of success 
with this method is to see whether, when these assumptions 
are made, negative loadings disappear; and whether the 
consequent loadings of those tests about which no assump- 
tions are made are compatible with the psychologist’s 
psychological analysis of them. Alexander’s assumptions, 
however, cannot generally be made in a usual battery of 
tests, and other methods of rotation are required. The 
simplest plan is to rotate the factors two at a time in their 
own plane. An example will best explain this. 

ere rotation.—Let us suppose that we have 

g set of loadings in eight tests for three factors : 
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4, II. III. „ 


1 | -4 4 1 -33 
2 3 — 4 74 
3 „„ DSBs gee Be 
4 Ol 3 9¹ 
5 5 A 33 
6 8 — 4 a 881 
75 8 sD RUS 65 
8 | 5 —8 “4 -50 


Suppose further that we want to rotate to positions of the 
three axes where there will be no negative loadings, or at 
least only few, and those small. We shall do this taking 
the axes two by two, and rotating each pair in its own plane. 
Take first axes I) and Ilp, 


where the subscripts indicate Io 
that no rotation has yet | 
taken place. Draw a dia- n, 


gram, using the loadings on 
I, and II, as co-ordinate axes va Z 
(Figure 21). We can see at es 
once that if we rotate the ~^, 55 45 
axes to new positions I, and Sus | ae 
Il, they will enclose all the FAR aos 78 
test points in their positive 880 
quadrant, and all the load- ~ | S F: 
ings on these two axes will A 
be positive. The position is, “1 
however, not unique, for we Figure 21. 
could have rotated a little 
farther, or a little less, than 0 and still enclosed all the 
points. Ihave taken 0 as 87°, with sine 0 = -6 and cosine 
i= 8. 

Consider now the point 5. Its co-ordinates on the former 


axes were 5 and -2, and clearly its new co-ordinates are— _ 


-5 cos 0 — 2 sin 0 = 28 
and 5 sin 0 + -2 cos 0 = 46 
These can be checked approximately on the diagram, and 
this should always be done, at least by eye if not by 


( 
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measurement. 


The new loadings of each of the tests can 


be calculated in the same way, giving— 


TETI, 
1 08 56 
2 38 66 
3 68 26 
4 78 46 
5 28 46 
6 88 116 
ae 18 76 
8 58 -06 


| Sum of squares 


32 
| 58 
| 53 

82 
| 29 
| -80 
| ‘61 
34 


At this point two checks should be made: (1) The sum 
of the squares of the loadings of any test in these two factors 


-42 + 42 for the first test. 
Jc, pair of rows should not have altered. 


(2) The inner product of any 


Thus, for the first 


two tests : 


AX * ＋ 4 Xx B= 40 


and 


08 X 38 + 56 X 66 = 4000 


It is sufficient to check only adjacent rows. 


Our three axes are now 
I, III, and III, and III, 
still has negative loadings. 
We must therefore rotate it 
with one of the others, 
which will have its loadings 
further changed. Let us 
choose I, and III, and with 
their loadings make this 
diagram (Figure 22). 

A little trial with a square 
corner of a piece of paper 
shows us that we cannot 
rotate the axes to a position 
which will completely en- 
close all the points, though 
we very nearly can. We 
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finally decide to make I, go exactly through point 2, whose 
co-ordinates in this diagram are ‘88 and — -4. The sine 
and cosine of h are therefore: 


4 38 
Vos „ U Veg L 
or 725 and +689 
(check that +725? + -689* = unity) 
The loadings of the point 5, for example, are then : 


689 X -28 — 725 X (— +2) = -888 on I, 
and 725 X -28 + -689 x (— 2) = -065 on II, 


as can be approximately checked by a look at the diagram. 
In the same way the other loadings on I, and ITI, can be 
found, giving the complete table : 


h N h? 


1 — 017 -560 127 3300 
2 552 660 000 7403 
3 686 260 286 6200 
4 320 -460 772 9100 
Dri 338 460 065 3301 
6 534 160 707 8106 
7 269 760 — 007 6500 
8 110 060 696 5001 


The sums of squares of each row ought to give the same 
values for h? as did the original table in Io, Ho, and III. 
And the inner product of any pair of rows ought to be 
identical also. For example, taking the last pair (it is 
sufficient to check adjacent rows), we have from this table : 


-269 X -110 + -760 x 060 — „007 Xx -696 = 0703 
and from the other : 
6 xX 5 — 5 * 3 — 2 xX 4 =-07 


We have now succeeded in replacing our original analysis, 
which had many negative loadings, by one which has only 
positive loadings (except for the two loadings which, 
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although negative, are nearly zero), and gives the same 
correlations and communalities. 

q 4. An orthogonal rotating matrie.—lf the reader will, in 
imagination, picture in his mind those original axes I, Ho, 
and II, as three lines at right angles to each other (ortho- 
gonal, as we say), he can further imagine them being 
turned bodily, using their common meeting-place as the 
swivelling-point and keeping them orthogonal, into their 
final positions I» III, and IIII. Actually we did it in two 
steps, but imagine it Reopen as one complex moyement. 

Arithmetically, this one movement can be imitated by 
“ post-multiplying the original table of loadings by an 
orthogonal matrix,” a piece of jargon we must hasten to 
explain. And the reader may miss this section out on 
first reading. A matrix, in mathematics, is an oblong or 
square set of numbers, to be used as an operator on other 
quantities. In our case it is to be used to rotate the original 
loadings to new positions. And since we want the axes to 
remain orthogonal, we have to use an orthogonal matrix, 
i.e. one in which the sum of the squares of any column or 
row is unity, and the inner product of any pair of rows or 
of columns is zero. Actually the orthogonal matrix which 
performs the rotation of the above section 3 is.: 


5512 -6000 5800 
— 4134 8000 —.4350 
| — 7250 0000 6890 


(The reader can check the sum of squares of any columm or 
row, and any inner product of a pair.) Before explaining 
how these numbers are arrived at, let us first perform the 
post-multiplication of the table of original loadings (itself an 
oblong matrix) by this rotating matrix— 


cise ete RAINA | 5512 6000 -5800| | —.917 -560 127 

8 — 4134-8000 —-4350 | ~ 552 -660 000 
7 —-2 —38 | | — 7250 -0000 -6890 | -686 -260 -286 | 
8 8 P 

1 320 460 772 | 
: 338 -460 -065 | 
8. — 584 -160 707 

6 5 — 2 Di J 1017 


269 „760 —-007 


ESN 4 
5 110 060 696 
= — 
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We have to say post-multiplication because in matrix 
algebra the product AB is not the same as the product BA. 
Matrix multiplication is performed by finding the inner 
product of each row of the first matrix with each column of 
the second matrix. Thus 
4X 5512 — 4 X -4184 — -1 & 7250 = — 0174 or — 017 
the first item in the product matrix above. Similarly, the 
quantity -707, which appears in the sixth row and third 
column of the product matrix, is the inner product of the 
sixth row of the first matrix and the third column of the 
second— 

8 X +5800 + 4 X 4350 + -1 X -6890 = 7069 or -707 
The reader can similarly check the other entries in the 
product matrix. 

When we performed the first of our previous two-by-two 
rotations we were in effect post-multiplying the loadings by 
the rotating matrix— 


8 6 00 
| — 6 8 o | 
| 0 0 1 | 


which will leave the column IIT, unchanged because of the 
nature of the third column of this rotating matrix. The 
inner product of 0, 0, and 1 with any row of the centroid 
loadings will give a column of loadings identical with ITI). 

When we performed the second two-by-two rotation, of 
I, and III, we were in effect multiplying by the matrix 


689 0 “725 

-000 1 000 

7 725 0 689 
which clearly does not alter the middle axis. And the 
rotating matrix which would have done these two opera- 

tions simultaneously is— 

8 6 0 689 0 725 353512 -6000 5800 
—.6 -8 0 x 000 1 -000| = | —-4134 8000 —-4850 
OS. Ont —-725 0 689 — 7250 +0000 6890 
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5. Reyburn and Taylors method.—These South African 
psychologists have proposed to let psychological insight 
alone guide the rotations to which axes are subjected. 
They do not necessarily insist on a g (see their 19414, pages 
258, 254, 258, ete.). Their plan is to choose a group of 
tests which their psychological knowledge, and a study of 
all that is previously known, leads them to consider to be 
clustered round a factor. They therefore cause one of their 
axes to pass through the centroid of this cluster, keeping all 
axes orthogonal. This factor axis they do not subse- 
quently move. They then formulate a hypothesis about 
a second factor and select a second group of tests, through 
whose centroid (retaining orthogonality) they pass their 
second factor axis. And so on. There is some affinity 
between this and Alexander’s method of rotation (see 
page 79). 

The arithmetical details of their method are as follows. 
They first obtain a table of centroid loadings in the usual 
way. Then, having chosen a group of tests which they 
think form, psychologically, a cluster, they add together 
the rows of the centroid table which refer to those tests, 
thus obtaining numbers proportional to the loadings of 
their centroid. These, after being normalized, form the 
first column of their rotating matrix. For example, 
consider this (imaginary and invented) table of loadings : 


Loadings 

ESET A A h? 
i 26 
2 5 — 8 —6 70 
3 6 —3 — 3 54 
4 5 2 1 30 
5 ode 2 36 
6 5 — 4 2 45 
y 5 22 | 30 
8 T 4 1 66 
9 7 — 2 3 62 
10 6 — 4 4 68 
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Reyburn and Taylor now decide, let us suppose, that Tests 
9 and 10 are, in their psychological view, very strongly 
impregnated with a verbal factor, and determine to rotate 
their original factors until one of them passes through the 
centroid of these two tests. They extract their rows, add 
them together, and normalize the three totals thus : 


(2) 7 —2 8 
(10) 6 —4 4 
1:3 —6 7 Sum of squares 2:54 = 15945 


816 —-876 439 obtained by dividing by 1:594 


If the columns of the original table are multiplied by these 
three numbers and the rows added, the result is the first 
column of the rotated factor loadings in the table below. 
To get the other two columns we must complete the rotating 
matrix in such a manner that the axes remain orthogonal. 
How this is done will be explained separately later. 
Meanwhile, consider the matrix— 


816 399 417 
—:876 — 183 909 
439 — 898 


| 
Its first column is composed of the above numbers. It is 
orthogonal, for the sum of the squares of any row or column 
is unity, and the inner product of any two is zero. When 
the original table of loadings is post-multiplied by this we 
get the rotated table : 


Rotated Loadings h? 
258 015 440 260 
257 793 — 064 699 
47¹ 564 — 022 540 
377 073 390 300 
088 266 530 359 
646 093 — 155 450 
289 253 390 300 
465 116 656 660 
778 047 110 620 
816 —.047 — 113 681 


O A N beoe 


— 
© 
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At this point the usual two checks must be made, of h? and 
of the inner products of consecutive rows. 

The first factor now goes through the centroid of Tests 
9 and 10, and we scan the loadings it has in the other 
tests to see if these are consistent with their psychological 
nature. For instance, Test 5 has practically no loading on 
this verbal factor—is this consistent with our psychological 
opinion of this test ? 

If this serutiny is satisfactory, the psychologist using 
this method then proceeds to consider where he will place 
his second factor ; for the second and third columns of the 
above loadings have still no necessary psychological mean- 
ing as they stand. Exactly the same procedure is carried 
out with them, the first column being left unaltered. 
Suppose the psychologist decided on Tests 5, 7, 8 as being 
a cluster round (say) a numerical factor. He adds their 
rows— 


(8) 416 -656 
635 1:576 
374 928 when normalized 


and uses their normalized totals as the first column of a 
matrix to rotate these last two columns. The matrix 
must be orthogonal, and it is in fact 


874-908 | 
928 —-874 | 
=P) 


When the second and third columns are rotated by post- 
multiplication by this, the final result is given opposite. 
(The same checks must now be repeated.) The psycho- 
logist now scans column two to see if the loadings of his 
numerical factor agree reasonably with his idea of each 
test, and is rather Sorry to see two negative loadings, but 
consoles himself by thinking that they are small.. He 
must finally try to name his third factor, present to an 
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Final Rotated Loadings 


1 | -258 414 —-151 
2 257 237 760 
3 47¹ 191 532 
4 377 389 —-078 
5 088 591 049 
6 646 —-109 144 
7 289 457 089 
8 465 652 —-138 
9 778 -120 -002 
10 816 —-122 —-001 


appreciable extent only in tests 2 and 3. If he thinks he 
recognizes it, he is content. 

6. Special orthogonal matrices. To carry out the 
above process the reader needs to have at his disposal 
orthogonal matrices of various sizes, such that he can give 
the first column any desired values. The following will 
serve his purpose. Except for the first one, they are not 
unique, and alternatives can be made. 


| 


v „ 
i — 
Order 3 ite 5 11 te I - n = 1 
| 2 2= 1 
8 ie p+” 


Tt was from this formula that the matrix used in the last 
section, with first column of -816, —-876, -439, was made. 
For if we set 

p = 439 

we have q = 898 
and from mq = ‘816 
we have m= 909 
and thence l= 417 
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Order 4 a bò —e —d | 


This one was used by Reyburn and Taylor in their 1939 
article (page 159). 

Similar matrices of higher order can be made by a 
recipe given by them, viz. multiplying together two or 
more of the above, suitably extended by ones and zeros. 
For example, a matrix orthogonal and with arbitrary first 
column, of order 5, can be made by multiplying together : 


Hp -A a m —Iq p 
un —ìrn —ọ mp p —q 
A u oe [eon ee | m 


where l + m? =p? + gq? =r? + py! =r? + o? = 1. 

7. Principles deciding where to stop rotation. We have 
mentioned two principles, (a) the desire to rotate to posi- 
tions where there will be few, if any, negative loadings 
—but usually this is insufficient to define a final position 
uniquely, and (b) Reyburn and Taylor’s plan of following 
their psychological intuition in placing the axes. They 
too accept the need for mainly positive loadings, and they 
keep their axes at right angles. We turn now, in our next 
chapter, to a principle (Simple Structure) which is accepted 
widely in America, though hardly at all in Great Britain. 


CHAPTER XI 
, SIMPLE STRUCTURE 


1. Agreement of mathematics and psychology.—lt is clear 
that the whole process of multifactor analysis is one by 
which a definition of the primary factors is arrived at by 
satisfying simultaneously certain mathematical principles 
and certain psychological intuitions. When these two 
sides of the process click into agreement, the worker has a 
sense of having made a definite step forward. The two 
support one another. Obviously the goal to be hoped for 
along this line of advance will be the discovery of some 
mathematical process which always leads to a unique set of 
factors mainly acceptable to the psychologist. If such 
could be discovered and found to produce a few factors 
over and above those recognized as already known by other 
means, the new factors would stand a good chance of 
acceptance on the strength of their mathematical descent 
only. And no doubt the psychologist would be prepared 
to make a few concessions and changes in his previous ideas 
to fit in with any mathematical scheme which already gave 
much satisfaction and was objective and unique in its 
results. 

It is here that Thurstone’s notion of “ simple structure ” 
is offered as a solution (Vectors, Chapters 6-8). This idea is 
that the axes are to be rotated until as many as possible of 
them are at right angles to as many as possible of the 
original test vectors ; and that the battery is not suitable 
for defining factors unless such a rotation is uniquely 
possible, a rotation which will leave every axis at right 
angles to at least as many tests as there are factors, and 
every test at right angles to at least one axis. 

When the vectors of a test and a factor are at right 
angles, the loading of the factor in that test is zero. 
Thurstone’s “ simple structure ” is therefore indicated by 
a large number of zeros in the matrix of loadings, so large 
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that there will be only one position of the axes (if any) 
which satisfies the requirement. His search, be it repeated, 
is for a set of conditions which will make the solution 
unique. We have seen him approaching this goal by 
stages. Unless the battery is large, so that— 


(27 ＋ 1) + +/(8r + 1) 


n> 2 


(see Chapter V, Section 9), the communalities are not 
unique. Even when the battery is large enough, the axes 
representing factors may be rotated to positions among 
which there is no one specially marked out. Then comes 
the demand that there be this large number of zero loadings. 
Most batteries of tests will not allow this demand to be 
satisfied, but with some it can just be attained. Only 
these last, it is Thurstone’s conviction, are suitable for 
defining primary factors, and it is his faith that the factors 
thus mathematically defined will be found to be acceptable 
as psychologically separable unitary traits. 

2. An example of six tests of rank 3.—To make our 
remarks more definite and concrete, let us suppose that 
we have a battery of six tests whose matrix of correlations 
can be reduced to rank 3. In practice, of course, six tests 
are far too few, and more than three factors quite likely. 
The matrix of loadings given by the “ centroid ” system 


contains at first negative quantities. Thus from the 
correlations : 


1 2 3 4 5 6 
1 . 525 -000 000 448 000 
2 525 ‘ 098 306 349 000 
3 000 098 . 133 314. 504. 
4 000 306 133 8 000 000 
0 AAS 349 314. 000 0 307 


with the communalities 


674 634 558 415 490 493 


we get by the “ centroid ” process the matrix of loadings : 
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It is the factor axes indicated by these loadings that 
Thurstone wishes to rotate until there are no negative 
loadings and enough zero loadings to make the position 
uniquely defined. For this last purpose he finds, empiri- 
cally, that it is necessary to require— 

(a) At least one zero loading in each row ; 

7(b) At least as many zero oadings in each column as 
there are columns (here aa 

z (c) At least as many XO or OX entries in each pair 
of columns as there are columns. By an XO entry is 
meant a loading in the one column opposite a zero in the 
other. 

“ At least one zero loading in each row.” This means 
that no tést may contain all the common factors. In 
making up the battery, then, the experimenter, with some 
idea in his mind as to what the factors are, will endeavour 
to ensure that they are not all present in any one test. 
This would, for example, exclude from a Thurstone battery 
(except as an extra) any very mixed group test, or a mixed 
test like the Binet-Simon which is itself a whole battery 
of varied items. 

„At least as many zeros in each column as there are 
columns,” that is, as there are common factors. This 
means that in a Thurstone battery no factor may be general, 
but must be missing in several tests. 

The requirement as to the number of XO or OX entries 
is intended to ensure that the tests are qualitatively 
distinct from one another. 

Now, these requirements cannot generally be met by a 
matrix of loadings. It will in general be impossible. to 
rotate the axes (keeping them orthogonal) until every 
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axis is at right angles to r test vectors. The above a. 1 
example has, however, been constructed so that th. od 
be done. 3 


. „ a n 
The correlations were in fact made from the loading 


A B C 
1 821 
2 i 475 639 
3 718 206 
4 ; 644 2 
5 438 5 546 
6 | 702 . 


and the centroid loadings must therefore be capable of 
being rotated rigidly into this form, retaining ortho- 
gonality. 
3. Two-by:two rotation to simple structure. The problem 
4 for the experimenter, however, is to discover this “ simple 
structure, if it exists; he is not, like us, in the position of 
knowing that it does exist, and what it is. Thurstone's 
original method was to use two-by-two rotations, in each 
7 rotation endeavouring to obtain some 
zero loadings. Let us illustrate by our 
artificial example, taking first the centroid 
factors I and II. Using their centroid 
loadings as co-ordinates, we obtain Figure 
23. At once we notice that the test 
points 8, 4, and 6 are almost collinear 
on a radius from the origin, and that 
if we rotate the axes clockwise through 
about 42° the new position of I, labelled 
l, in the diagram, will almost pass 
Figure 23, through these test points, while the new 
axis II, will almost pass through test 
point 1. On these new axes, therefore, Tests 8, 4, and 6 
will have hardly any projections on axis III; that is, will 
have hardly any loadings in a factor along III. From 


tables we find sin 42° — 669, and cos 42° = 743, We 
have then ; 
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Old loadings New loadings 

I II DA 
1 542 612 —-007 817 
2 629 342 2839 675 
3 539 —-492 | 722 — 012 
4. 281 % OS Lene ODE: 
5 628 143 371 526 
6 429 —-424 | 602 —-028 

multipliers 743 —-669 for I, loadings, 


669 743 for IL, loadings. 


We have now obtained our desired three zero (or near zero) 
loadings in factor III. Accepting the approximations to 
zero as good enough for the present, 
we next make Figure 24 from the 
loadings of I, and III in the same 
way as we made the former figure. 
In this, Test 1 falls quite near the 
origin. Tests 5 and 6 are approxi- 
mately on one radius, and Tests 2 
and 4 on another, and these radii 
are at right angles to one another. 
If we rotate the axes I, and III 
rigidly through a clockwise turn 
of about 49° they will pass almost 
through these radial groups and 
nearly zero. projections will result.“ Using sin 49° = -755 
and cos 49° = -656 we perform a similar calculation to the 
preceding, using the loadings II and II as starting-point and 
obtaining loadings on I, and III. (the subseript indicating 
the number of rotations that axis has undergone). We 
have finally, putting our results together, the table of 
loadings overleaf FA. 


Figure 24. 


*The rotation might with advantage have been carried a little 
further. 
+The matrix symbols, using Thurstone’s notation, are given for 
the convenience of mathematical readers. Others should ignore 
them. When the tests are many and the centroids few, a saving can 
be effected by picking tests equal in number to the factors and per- 
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if TESIT, 
1 —060 817 -043 
2 | 420 4675 —-048 
3 | 329 ——-012 -670 
4 -632 053 —111 
5 037 526 460 
6 | 124 —-028 690 


Clearly, this is an approximation to the loadings of the 
factors A, B, and C which we who are in the secret (as a 
real experimenter is not) know to have been used in making 
the correlations: III, here is 4, I, here is B, and I, is C. 
The small loadings are not quite zero, and the other load- 
ings not quite the same, but a further set of rotations 
would refine the results and bring them nearer to the 
A BC values. 

4. New rotational method.—When this two-by-two rota- 
tional method is used on a large battery of tests, with 
perhaps six or seven factors instead of three, it is not 
only laborious but somewhat difficult to follow. Thur- 
stone has, however, devised a method of rotation which 
takes the factors three at a time, and to this we now turn, 
still using our small artificial example as illustration. In 
this example, since there are only three factors, this new 
method leads to a complete solution at once. With more 
factors the matter would be more complicated. 

If the reader will think of the three centroid factors as 
represented by imaginary lines in the room in which he is 
sitting (Figure 25), he will be aided in following the 
explanation of this new method. Imagine the first 
centroid axis to be vertically in the middle of the room, 
and the other two centroid axes on the carpet, at right 
angles to the first and to each other. The test points are 
in various positions in the room space, if we take their three 
centroid loadings as co-ordinates and treat the distance from 
floor to ceiling as unity. Imagine each test point joined 
forming two-by-two rotations on their loadings F,. Let the result- 
ing loadings be V,. Then R = F, 1 V, can be used as a rotating 


matrix on the whole table F of centroid loadings. The tests chosen 
to form F, should represent different clusters. 
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by a line to the origin (in the middle of the carpet, where 
the axes cross). The lengths of these lines are the square 
roots of the communalities, and the loadings on the first 
centroid factor are their projections on to the vertical axis, 
the height, that is, of each test point above the floor. 


Figure 25 (not to scale). 


Thurstone now imagines each of these lines or Com- 
munality vectors produced until it hits the ceiling, making 
a pattern of dots on the ceiling. These extended vectors 
now all have unit projection on the first centroid axis, 
for we agreed to call the distance from floor to ceiling 
unity. Their y and z co-ordinates on the ceiling will be 
correspondingly larger than their loadings on the second 
and third centroid factors, and can be obtained by dividing 
each row of the centroid loadings by the first loading. In 
our case this gives us the following table, obtained in the 
manner just mentioned from the table on page 153. 


Extended centroid projections 
Naa if, III. 


1.000 1-129 137 


1 

2 » 544 —:553 
e —-980 861 
| ee oat; —.648 —1-957 
5 -228 436 
6 3 —-988 -837 
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The second and third columns are now the co-ordinates 
of those dots on the ceiling of which we spoke. A diagram 
of the ceiling, seen from above, is given in Figure 26 and 


the important point about 
it is that the dots form a 
triangle. 


If the reader will now 
picture this triangle as 
drawn on the ceiling of 
his room, and remember 
that the origin, where the 
centroid axes crossed, is in 
the middle of the carpet, 
he can next imagine an 
inverted three-cornered 
pyramid, with the triangle 
on the ceiling as its base, 

Figure 26. the origin in the middle of 

the carpet as its apex and 

the communality vectors 1, 4, and 6 as its edges. The 

vector 5 lies on one of the faces of this pyramid ; vector 

2 lies on another; vector 3 lies on the remaining face, all 
springing from the origin and going up to the ceiling. 

af Finding the new awves—If now we choose for new 
axes (in place of the centroid axes) three lines at right 
angles respectively to the three plane faces of our pyramid, 
the test projections on these axes will clearly have the 
Zeros we desire.) The three vectors 1, 2, and 4 all lie in 
one face, and will have zero projections on the axis 4’ 
at right angles to that face. The vectors 1, 5, and 6 will 
have zero projections on the line B’ at right angles to their 
face. The vectors 3, 4, and 6 will have zero projections on 
Cc at right angles to their face. The reader should 
visualize these new axes in his room. It remains to be 
shown how the other, non-zero, projections are to be 
calculated, and to inquire whether these new axes are 


orthogonal, and whether the i i j 
N A, B. ae y can be identified with the 


tions of the three sid 
Where there are many 
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collinear, one plan is to draw a line through them by eye, 
and measure the distances a and b it cuts off on the axes, 
then using the equation— 


UT R 
a E 
PRRI 


Or we can write down the equations of the lines joining 
points at the corners, either actual test points, or the places 
where our lines intersect, using the equation— 


(lv — mu) + (m - v) y + (u — l) z = 
when J, m are the co-ordinates of one corner, and u, v of 
another. We obtain in our case— 


— 2-121 + 2-094y — 1:777z = 0 for line 1, 2, 4 
— 1:080 + -700y + 21172 =0 „ , 1, 5, 6 
2:476 + 2-794y + 3408 = „% „ 8 
where y means the extended II, and z the extended III. 
Before we go further we have to divide each equation 
through by the root of the sum of the squares of its 
coefficients, so that the new coefficients sum to unity when 
squared—this is called normalizing and is necessary in 
order to keep the communalities right and for other reasons. 
The equations then are : 
— -611 + -6083y — 5122 0 (1) 
— -486 + -288y + 854z =0 (2) 
-660 + -745y + 091z =0 (3) 
and it is clear, from the way in which they have been 
reached, that these equations will be satisfied by the ex- 
tended co-ordinates of certain of the rows in the table on 
page 153. Consider the first equation and write its co- 


—611 -603 —:512 | Weighted 


| æ y z | sum 
— — . 
542 4612 074 000 
620 342 —348 000 
529 —.492 191 —-718 


— 182 —.550 000 
628 143 274  —-438 
429 —.424 359 —-701 


S e a E > E a 
— 
E 
= 
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efficients above the columns of that table, placing — -611 
over the first column, as shown at the foot of page 159. 
If we multiply each column by the multiplier above it 
and add the rows we get the quantities shown on the right 
for comparison with page 154. The zeros are in the right 
places for factor A. The other loadings are, however, 
negative, but that can be easily put right by changing all 
the signs of the multipliers, which we are at liberty to do. 


Similarly, using eqns. (2) and (3) we get the loadings of 


factors B and C exactly, except for an occasional difference 
due to rounding off at the third decimal place. We have, 
indeed, found the matrix product FA, 


— e 


542 6012 074 611 486 660 |_| . 821 
629 342 318 — 603 —-288 -745 „ 475 -639 
529 —.492 101 512 —-854 -091 718 -206 
281 —-182 —-550| |- „ -644 


628 143 274 
429 —-424 359 
za 


"438 546 
702 


except, as has been already said, for occasional dis- 
crepancies in the third decimal place. The procedure we 
have described has enabled us to discover this last matrix, 
with which, in fact, we began. And by analogy (is the 
deduction sound ?) an experimenter with experimental 
data who follows this procedure and reaches simple 
structure concludes that that is how his correlations were 
made. Certainly that is how they may have been made. 
The matrix A beginning with -611 is the rotating 
matrix which turns the axes I, II, III into the new posi- 
tions 4, B, C. Its columns are the direction-cosines of 
A, B, and C with reference to the orthogonal system 
I, II, III. Are 4, B, and C orthogonal? The cosines of 
the angles between them can by a well-known rule. be 
found by premultiplying the rotating matrix by its 
transpose. When we do so we find A’A = I, viz. : ; 


011 —.603 512 | 611 436 -660 1 . 
436 —.283 —.854 —.603 —-283 745 
600 745 091 3p —:854 -091| |. 1 
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(again allowing for third decimal place discrepancies). 
That is to say, the angles between A, B, and C have zero 
cosines, they are right angles. 

The axes A, B, and C were drawn at right angles to the 
three planes which form the pyramid mentioned above, and 
therefore these three planes are also at right angles to one 
another. (Our rough sketch in Figure 25 made the pyra- 
mid too acute.) It follows that A, B, and C are actually 
the edges of the pyramid. In our example (though this 
need not be the case) they happen to pass each through a 
test point in the room, A through Test 6, B through Test 4, 
and C through Test 1. These tests are not identical with 
the factors, for each test contains a specific element, not in 
the common-factor space, but at right angles to it. What 
we have called a test point is the end of the unit test vector 
projected on to the common-factor space. The complete 
test vectors are out in a space of more dimensions, of 
which the three-dimensional common-factor space is a 
subspace. 

6. Landahl preliminary rotations.—When there are more 
than three centroid factors, the calculations are not so 
simple. If the common-factor space is, for example, 
four-dimensional, then the table of extended vectors, in 
addition to its first column of unities, will-have three other 
columns. The two-dimensional ceiling of our room, in our 
former analogy, has here become three-dimensional, a 
hyper-plane at right angles to the first centroid axis. On 
paper its dimensions can only be graphed two at a time, 
and no complete triangle will be visible among the dots. 
But sets of dots will be seen to be collinear, lines can be 
drawn through them, and a procedure similar to that out- 
lined above followed. This will become clearer when we 
work a four-dimensional example. First, however, it is 
desirable to explain, on our simple three-dimensional 
example, a device which facilitates the work on higher 
dimensional problems, called the Landahl rotation. It is 
unnecessary in the three-dimensional case, and we are 
using it only to explain it for use with more than three 
dimensions. 

A Landahl rotation turns the centroid axes solidly 


F.A—6 
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to a position where each of them is equally inclined to the 
original first centroid axis. In our imagined room the first 
centroid axis ran vertically from the middle of the floor 
to the middle of the ceiling, while the other two were 
drawn on the floor itself. Imagine all three (retaining 
their orthogonality) to be moved, on the origin as pivot, 
until they are equally inclined to the vertical so that they 
enclose the inverted pyramid of Figure 25. That is a 
Landahl rotation. The lines through the test points have 
not moved. They remain where they were, and still hit 
the ceiling in the same pattern of dots. The projections of 
the extended vectors on to the original first centroid 
axis all still remain unity. But for the next step in this 
method we need their projections on to the Landahl axes. 
We obtain these by post-multiplying the matrix of cen- 
troid extended loadings by a Landahl matrix, an orthogonal 
matrix with each element in its first row equal to 7 
V 
where ¢ is the order of the matrix; that is, its number of 
rows or columns (Landahl, 1938). We need a Landahl 
matrix of order 3, for example : 


| 577 77 577 
:816 —-408 —-408 
000 707 —-707 


The element -577 is the cosine of the angle which each axis 
makes, after rotation, with the original position of the first 
centroid axis. 

When the table of extended vector projections on page 
157 is post-multiplied by the above matrix, the table on 
page 163 results, giving the projections of the extended 
vectors on to the Landahl axes L, M, N. 

From this table three diagrams LM, LN, and MN can 
be made, and the reader is advised to draw them. Each 
of them shows a triangular distribution of dots and in this 
simple three-dimensional example only one of them is 
needed. But in a multi-dimensional problem several are 
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Projections on Landahl axes* 


| L M N 
1 | 1-498 -213 -020 
2 | 1:021  —-066 746 
s | —-182 1-212 701 
4 | 048 —.542 20225 
5 -760 792 176 
6 | —-229 1:572 388 


needed, and as a rule only one line is used on each diagram 
employed. Here, from the LN diagram we find the 
equations of the three sides of the triangle to be: 


— 2.2051 — 1-450n + 3-332 = 0 
3681 + 1:727n — 586 = 0 
1:8871 — -277n + 528 =0 


We want to make these homogeneous in l, m, and n, and so 
we add, after each of the numerical terms, the factor 
577 (L+ m+n), which equals unity. The equations 
then are : 


— 2827 + 1-928m + -4738n = 0 
030 — 8838m + 1:389n = 0 
2-1421 + 305m + -028n = 0 


* After a Landahl adjustment the axes are not infrequently 
already near simple structure, as here. It is sometimes worth while 
to rotate them slowly round the original first centroid, like spinning 
an umbrella, to improve the approximation to zero entries, This 
can be done by an orthogonal matrix whose columns sum to unity, 
as e.g. 


9900 —-0946 1046 
1046 9900 — 0946 
— 0946 1046 9900 


or its transpose: and the rotation will be the slower, the nearer the 
diagonal elements are to unity. 
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After normalizing, these become : 


—-1411 + -961m + -236n = 0 
‘0211 — 236m + -971n = 0 
-990l + -141m + -018n = 0 
Writing the coefficients as columns in a matrix, and 
premultiplying by Landahl’s matrix (since at an earlier 
stage we post-multiplied by it) we obtain : 


609 436 660 | 


| —608 —-288 745 
| -513 —:853 090 


the same matrix A as we arrived at (page 160) without the 
use of Landahl’s rotation. The advantage of using a 
Landahl rotation appears only in problems with more 
than three common factors. The reader can readily make 
a Landahl matrix of any required order, say 5. Fill the 
first row with the root reciprocal of 5, -447. Complete 
the first column by putting in the second place -894 
(because -447? + -8942 = 1), and below that zeros. The 
second row must then be completed with equal elements, 
all negative, such that the row sums to zero. Then the 
second column is completed in a similar way, and the third 
row, and so on. The reader should finish it. There are 
alternative forms possible, one of which is used below. 
An unfinished Landahl matrix: 


447 447 447 447 4 
894 —.224 —.224 —.224 —.224 
000 866 —.289 —.289 — 289 
000 000 
000 000 


7. A four-dimensional eaample. Ahe following example 
of a problem with four common factors is only partly 
worked out, so that the reader can finish it as an exereise. 
It also is an artificial example, and orthogonal simple 
structure can be arrived at. The centroid analysis gave 
four centroid factors with the loadings shown in this table : 
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Centroid loadings F 
I II III IV 


727 517 094. 126 
575 105 553 049 
810 289 246 — 246 
588 417 — 367 — 382 
524 — 583 — 450 183 
— 013 
624 — 318 — 187 — 254 
594 — 551 239 084. 
626 252 — 169 562 
645 307 —-357 — 109 
After these have been “ extended” (i.e. divided in each 


row by the first loading) they were post-multiplied by a 
Landahl matrix, one of the alternative forms, viz. : 


O o 
or 
D 
© 
= 
g9 
or 
o 
So 
ao 


— 


5 5 Thee Bia} 
“5 5 ee eet 
5 3 5 38 
5 .—5 —5 5 


and the resulting projections on the Landahl axes were 
thus found to be: 


72 M N P 
1 1:007 704 +122 -166 
2| 1115 068 848 —-030 
Soul -679 678 “625 018 
4 218 1.492 158 -132 
5 — 311 199 453 1660 
6 455 —-247 1-270 +522 
7 —:107 598 808 701 
8 308 — 235 1094 833 
9 1-015 387 — 285 88g 
10 376 1099 070 -454 


Six diagrams can be made, and it is advisable to draw 
them all, though not all are necessary. The LN diagram 
is shown in Figure 27. We scan it for collinear points 
(not necessarily radial) which have all or nearly all the other 
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points on one side of their line, and note the line 5, 4, 10, 9. 
Its equation is readily found to be approximately : 


7381 ＋ 1:327n — -871 = 0. 


We make this homogeneous by substituting for unity, after 
the numerical term -371, the quantity -5 (l + m + n + p), 
for -5 is the cosine of the angle each of the Landahl axes 
makes with the original first centroid axis. This gives us 
the equation (not yet normalized) : 
553 — -185m + 1:141n — -185p = 0. 

Three more equations are needed, and one of them can 
indeed be obtained from the same diagram, on which 
points 5, 7, 8, 6 are very nearly collinear. The reader is 
advised to draw the remaining diagrams and complete the 
calculations following the steps of our previous example. 
The above equation refers to a line which makes a fairly 
big angle with N. It is desirable to look for the remaining 
three lines making large angles (approaching right angles) 
with L, M, and P. 

It will be remembered that in our earlier example the 
sign of one equation had to be changed at the end of the 
calculation because large negative values were appearing 
in the final matrix of loadings. This can be obviated 
by attending to the following rule. If the other test-points 
are on the same side of the line as the origin the numerical 
term must be positive in the 
equation ; if they are on the 
side remote from the origin 
the numerical term must be 
negative. In the adjacent 
diagram, the origin and the 
other points are on opposite 
sides of the line through 
5, 4, 10, 9 and therefore 
the numerical term must be 

Figure 27. negative, as it is (—-871). 
f Had it been positive all the 
signs of the equation would have required to be changed. 

8. Ledermann’s method of reaching simple structure.— 
Ledermann has pointed out that when simple structure 
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can be attained (whether orthogonal or oblique) then as 
many r-rowed principal minors of the reduced correlation’ 
matrix must vanish as there are common factors; and 
that it follows that the same number of vanishing deter- 
minants must be discoverable in the table of centroid 
loadings. Thus, for example, in the table of centroid 
loadings on page 153 the three determinants composed 
respectively of rows 1, 2, and 4; of rows 1, 5, and 6; and 
of rows 8, 4, and 6 all vanish, and these rows are where the 
zeros come in the three columns of the simple structure. 
This gives an alternative method of reaching simple 
structure. Test every possible r-rowed determinant in 
the centroid table of r factors. If r of them are discovered 
to vanish, then simple structure may be and probably is 
possible. Each of these vanishing determinants will 
provide a column of the rotating matrix A, for which pur- 
pose we delete any one of its rows and calculate all the r—1 
rowed minors from what is left. The column has then to 
be normalized. This process works equally well for 
oblique simple structure (see next Chapter). Its draw- 
back, when the number of factors is large, is the necessity 
of calculating so many determinants to discover those that 
vanish. 

9. Limits to the extent of factors.*—Orthogonal simple 
structure requires that no factor shall extend through many 
tests, and it is possible to decide beforehand, from the 
correlations, whether factors running through not more 
than s tests each are adequate to give the measured correla- 
tions, leaving n — s zeros. They will not as a rule be able 
to do so if the average correlation exceeds (s — 1)/(m — 1): 
more exactly, not if the largest latent root of the matrix 
is larger than s. If these rules are to be applied when 
communalities are used, as is the case when testing whether 
orthogonal simple structure is possible, the matrix should 
first be “ corrected for communality,” i.e. each 7 must be 
divided by the square root of the product of the two com- 
munalities concerned. Approximations to the largest 
latent root of a matrix of correlations, when the entries are 
all positive, are— 

* A brief summary of a chapter with this title in previous editions, 
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sum of the whole matrix 
n 


or more accurately— 


sum of the squares of the column totals 


sum of the whole matrix 

An exact test for the possibility of orthogonal simple 
structure has been given (Ledermann, 1936) and is des- 
cribed in the Appendix, page 367, but it requires a pro- 
hibitive amount of calculation. 

Even, however, when orthogonal simple structure cannot 
be attained with orthogonal factors, it may be possible to 
reach it with oblique factors. 

10. Leading to oblique factors. In this chapter we have 
kept our factors orthogonal; that is, independent, un- 
correlated with one another. It is natural to desire them 
to be different qualities, and convenient statistically. In 
describing a man, or an occupation, it would seem to be 
both confusing and uneconomical to use factors which, 
as it were, overlapped. Yet in situations where more 
familiar entities are dealt with, we do not hesitate to use 
correlated measures in describing a man. For instance, 
we give a man’s height and weight, although these are 
correlated qualities. 

Often, moreover, a battery of tests which will not 
permit simple structure to be reached if orthogonal 
factors are insisted on will nevertheless do so if the factors 
are allowed to sag away a little from strict orthogonality. 
Even as early as in Vectors of Mind, Thurstone expressly 
permitted this. It can clearly be defended on the ground 
that even if the factors were uncorrelated in the whole 
population, they might well be correlated to some extent in 
the sample of people actually tested. I was at one time 
under the impression that this comparatively slight de- 
parture from orthogonality was all that was contemplated 
by Thurstone. But he and his fellow-workers now have 
the courage of their convictions, and permit factors to 

depart from orthogonality as much as is necessary to attain 
simple structure, even if they are then found to be quite 


ORTHOGONAL SIMPLE STRUCTURE 169 


highly correlated. A chapter on these oblique factors* is 
therefore necessary, and out of them arise Thurstone’s 
“ second order factors.” 

11. Parallel proportional profiles & method which, like 
Thurstone’s simple structure, is meant to enable us to 
arrive at factors which are real entities, or to check 
whether our hypotheses about the factor composition of 
tests are correct, has been put forward by R. B. Cattell 
(49440, 1946), and has interesting possibilities Which its 
author will no doubt develop. The essence of his idea 
is that “if a factor is one which corresponds to a true 
functional unity, it will be inereased or decreased ‘as a 
whole’,” and therefore if the same tests are given under 
two different sets of circumstance, which favour a certain 
factor more in one case and less in the other, the loadings 
of the tests in that factor should all change in the same pro- 
portion. Experimental trials of this principle may be ex- 
pected soon from its author. Among “ different cireum- 
stances ” he mentions different samples of subjects, differ- 
ing, say, in age or sex, and different methods of scoring, or 
different associated tests in the battery. But he prefers 
another kind of change of cireumstance ; namely a change 
“from measures of static, inter-individual differences to 
measures from other sources of differences in the same 
variables.” He instances, among his examples, inter- 
correlating changes in scores of individuals with time, or 
intercorrelating differences of scores in twins. We may 
thus have two, or several, centroid analyses, and the mathe- 
matical problem is to find rotations which will leave the 
profile of loadings of a certain factor similar in all the factor 
matrices. It may even be that the profiles of several fac- 
tors could be made similar. These factors would then 
satisfy Cattell’s requirement as corresponding to “true 
functional unities.” The necessary modes of calculation 
to perform these rotations have not yet been more than 
adumbrated, however. 


* It must be clearly understood that this obliquity or correlation 
of factors is quite a different matter from the correlation of estimates, 
even of orthogonal factors, due to the excess of factors over tests 
described on pages 237 to 242. 


¥F.A.—6* 


CHAPTER XII 
OBLIQUE FACTORS 


Y. Pattern and structure. So long as the factors are 
orthogonal, the loadings in the matrix of loadings are also 
the correlations between the factor and the tests, but this 
ceases to be the case when the factors are correlated. The 
word “loading” continues to be used for the coefficients 
such as I, m, and n in equations like 


z =la + mp + ny 


and the matrix or table of these is called a pattern, while 
the matrix of correlations between tests and factors is 
called a structure. The entries in a structure are pro- 
jections from a point on to certain axes. The entries in a 
pattern are the oblique co-ordinates of that point along 
those axes. The two are only identical if the axes are 
orthogonal. 

Moreover, as soon as the factors become oblique, it 
becomes necessary to distinguish between “ reference 
vectors ” and “ primary factors.” The reference vectors 
are the positions to which the centroid axes have been 
rotated so that the test-projections on to them include a 
number of zeros. Each reference vector is at right angles 
to a hyperplane containing a number of communality 
vectors. A hyperplane is a space of one dimension less 
than the common-factor space. In our first example in 
Chapter XI the hyperplanes were ordinary planes, the 
faces of the three-cornered pyramid there referred to (see 
page 157) and each reference vector was at right angles to 
one of those faces. 

The primary factor corresponding to a given reference 
vector is the line of intersection of all the other hyper- 
planes, excluding, that is, the hyperplane at right angles to 
the reference vector. In our three-dimensional common- 
factor space the primary factor was the edge of the pyra- 
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mid where those two faces met, excluding that face to 
which the reference vector was orthogonal. 

Now, when the reference vectors turn out to be at right 
angles to each other, as they did in that example, each 
reference vector is identical with its own primary factor. 
But not when the reference vectors turn out to be oblique. 
In Chapter XI we did not distinguish them, and called their 
common line the “ factor.” But in this chapter the dis- 
tinction must be kept clearly in mind. It is the primary 
factors Thurstone wants. The reference vectors are only 
a means to an end. 

Thurstone’s second method of rotation described in 
Chapter XI, the method in which the communality 
vectors are “ extended,” and lines drawn on the diagrams 
which are not necessarily radial lines, will not keep the 
axes orthogonal, but seeks for the axes on which a number 
of projections are zero, regardless of whether the resulting 
directions are orthogonal or oblique. In general they will 
be oblique, and the examples worked in Chapter XI only 
gave orthogonal simple structure because they had been 
devised so as to do so. The test of orthogonality is that 
the matrix of rotation, premultiplied by its transpose, 
gives the unit matrix (see page 160). Or in other words, 
that the inner products of the columns of the rotating 
matrix are all zero. They are the cosines of the angles 
between the reference vectors, and the cosine of 90° is 
Zero. 

2. Three oblique factors To illustrate Thurstone's 
method when the resulting factors are oblique we shall 
next work an example devised to give three oblique 
common factors. Consider this matrix of correlations : 


1 2 3 4 5 6 7 
: 372 153 105 126 


1 
2728 696 583 -651 347 638 
8 167 696 857 775 709 740 
4 372 583 -857 543 797 473 
153 651 775 543 504 828 
797 504 433 


EQ 
— 
© 
or 
85 
2 
2 
a 
S 
D 


126 638 -740 473 -828 433 
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which, with guessed communalities, gives these centroid 
loadings : 


F 

I II III 
1 449 — 682 165 
2 825 — 478 — 129 
3 906 336 020 
4 846 133 457 
5 808 208 — 412 
6 697 336 335 
uf 


0 173 — 468 


When these projections on the centroid axes are “ ex- 
tended,” that is, when each row is divided by the first 
loading in that row, we obtain this table : 


if r. 
1 1-000 — 1:519 367 
2 S — 579 — 156 
3 155 371 022 
4 Ay 157 540 
5 35 257 — 510 
6 45 482 481 
15 a 226 —-610 


The columns II, and III, in this table represent the co- 
ordinates. of the “ dots 
on the ceiling” in our 
analogy of Chapter XI, 
p. 157. When we make 
a diagram of them we 
obtain Figure 28. We see 
that a triangular forma- 
tion is present, and we 
draw the dotted lines 
shown. 

It is not essential, it 
may be remarked in pass- 
ing, that there be no 
Figure 28, points elsewhere than on 


ͤü— a — 


OBLIQUE FACTORS 173 


the lines, provided they are additional to those required to 
fix the simple structure. Had it not been for the desirability 
of keeping the example small we would have increased the 
number of tests, and not only arranged for further points 
to fall on these lines, but also included some whose dots 
fell inside the triangle, representing tests which involve all 
three factors. 

We find the equations of these lines to be approximately 


475 + ‘50y + +952 = 0 (line 1, 2, 7) 
1:113 + -183y — 2.1192 = 0 (line 1, 4, 6) 
403 — 1-091y + 2562 = 0 (line 7, 5, 3, 6) 


The coefficients of each equation have to be “norma- 
lized,” that is, reduced proportionately so that the sum of 
their squares is unity (for they are to be direction cosines). 
These normalized coefficients are then written as columns 
in a matrix as follows : 


464 338 


076 — 916 =A 


— 883 215 


The table of centroid loadings on page 172 must now be 
post-multiplied by this rotating matrix to obtain the 
projections of the tests on the three reference vectors which 
are at right angles to the planes defined by the dotted lines 
in our diagram. We obtain this table : 


NIEN 
(Simple) Structure on the Reference Vectors 
L* B. * 


025 011 812 
026 460 689 
526 428 003 
769 —.001 262 
088 755 —-006 
696 053 000 
006 782 000 


X arwnre 
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We have labelled the columns L', B', and D’ for a reason 
which will become apparent later, when we explain how 
the correlations were, in fact, made. This table is a simple 
structure, formed by the projections on the reference 
vectors. It has a zero (or near-zero) in each row, and 
three or more in each column, in the positions to be 
anticipated from Figure 28; for example, tests 3, 5, 6, 
and 7, which are collinear in the figure, have zeros in 
column D’, 

Now let us test the angles between the reference vectors. 
To do this we premultiply the rotating matrix by its 
transpose 


AK = C 
yoa ] 
405 426 -809 | | 405 464 338 1 — 494 —·079 
464 076 —-883 426 »076 — 916 = —-494 1 — 103 


388 —-916 215 809 — 883 218 070 —-103 1 


This gives the cosines of the angles between the reference 
vectors and we see that they are obtuse. The angles are 
approximately : 


5 120° 95° 
120° ‘ 96° 
95° 96° 


As soon as we know that the reference vectors are not 
orthogonal, we have to take account of the fact that the 
primary factors are not identical with them. Each prim- 
ary factor is the line in which the hyperplanes intersect, 
excluding that hyperplane to which the corresponding 
reference vector is orthogonal. In a three-dimensional 
common-factor space like ours the primary factors lie 
pi the edges of the pyramid which the extended vectors 
orm. 

Let us return to our mental picture, which the reader 
can place in the room in which he is sitting. The origin, 
immediately below the point O in Figure 28, is in the middle 
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of the carpet. Figure 28 itself is on the ceiling, seen from 
above as though translucent. The radial lines with 
arrowheads are the projections of the primary factors on 
to the ceiling. The projections of the reference vectors 
are not drawn, to avoid confusion in the figure. They 
are near, but not identical with, the primary factors. 

The reader should not be misled by the fact that two of 
the primary factors lie along the same lines as Tests 1 
and 7. It was necessary to allow this in devising an ex- 
ample with very few tests in it (to avoid much calculation 
and printing large tables). But with a large number of 
tests the lines of the triangle could have been defined 
without any test being actually at a corner. 

3. Primary factors and reference vectors.—At about this 
stage a disturbing thought may have occurred to the 
reader. We have sought for, and obtained, simple 
structure on the reference vectors. That is to say, we 
have found three vectors, three imaginary tests, which are 
uncorrelated each with a group of the actual tests, namely 
where there are zeros in the table on page 173. The entries 
in that table are the projections of the actual tests on the 
reference vectors. 

But the primary factors are different from the reference 
vectors. The projections of the tests on to the primary 
factors will be different and will not show these zeros. 
Those projections are, in fact, given in this table (never 
mind for the moment how it is arrived at) : 


F(A’) D 
Structure on the Primary Factors 


L B D 
160 162 832 
408 666 793 
866 809 176 
934. 495 401 
-5AL 927 152 
842 472 132 
468 ‘915 150 


18 0 | 
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Score in Test 1 = · 826d + Specific 
„ „ „ 2 = 5365 + 701d + P 

3 = 612l + -499b + 8 

4 = 8951 + -266d + re 

„„ 5 = 8808 „ 

6 = 809“ + ss 

7 = 9120 =F 55 


4, Behind the scenes. It is now time to divulge what 
these “tests” really are and how the “scores” were 
made whose correlations we have been analysing, and to 
compare our analysis with the reality. The example is a 
simpler and shorter variety of a device used by Thurstone 
and published in April 1940 in the Psychological Bulletin. 
The measurements behind the correlations were not made 
on a number of persons, but were made on a number of 
boxes—only eight boxes, to keep down the amount of 
calculation and printing. These boxes were of the follow- 
ing dimensions : 


Length Breadth Depth 


= 


o k Ww 


Sum 32 2 
Mean 4 


The “tests” were seven functions of these dimensions, 
and are shown in the next table, which also shows the 
score each box (or “ person ”) would achieve in that test. 
It is as though someone was unable for some reason to 
measure the primary length, breadth, and depth of these 
boxes (as we are unable to measure the primary factors 
of the mind directly) but was able to measure these more 
complex quantities like LB, or a/(L? + D?) (as we are 
able to measure scores in complex tests): 
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Boges = Persons 


Test| Formula 1 2 8 4.5 6 7 8 | Sune Mean 
2 = ee | Sasa 
1 D? | 1 h 1 AA ale ‚ 09 
2 BD 2 265 4, 16. ESES SIAO 49 | 6125 
8 E E 4 6 6 18 16 15 20 16 101 12625 
& \/(L2-+ D?) 2-24 4-24 3-61 6°82 4-47 5.10 5:83 4-47 36-28 | 4535 
5 LAB H T S20 aera 20 | 110 13750 
6 LD | 5 12 11 38 18 26 28 18) 156 | 19-500 
yal B 2 2 2 8 4 3 4 4 24 8:000 


With these scores the sums of squares and produets of 
deviations from the mean are: 


| 1 2 3 4 5 6 if 


1 | 66 505 225 10-2 25 29 3 
2 | 505 72:9 98:4 168 1123 1005 16 
3 | 225 98-4 273:9 47:9 2592 398:5 36 
4 | 102 168 479 114 87:0 918 4&7 
5 25 1123 259-2 370 283:5 288 41 
6 29 1005 398-5 91-3 288 800 36 
753 16 36 4.7 41 36 6 


From these the correlations could be calculated by dividing 
each row and column by the square root of the diagonal 
cell entry. But that would make no allowance for specific 
factors, which in all actual psychological tests play a 
considerable part. In the example devised by Thurstone 
on which this is modelled there are no specific factors, but 
it was decided to introduce them here into Tests 5, 6, and 7, 
by increasing their sums of squares. In addition, by an 
arithmetical slip, a small group factor was added to these 
three tests, and this was not discovered for some time. It 
was decided to leave it, for in a way it makes the example 
more realistic, and may be taken to represent an experi- 
mental error of some sort running through these three tests. 

With these changes, the correlations are found, and are 
those with which we began this chapter and which we have 
already analysed into three oblique factors L, B, and D. 
Let us now compare that analysis with the formule which 
we now know to represent the tests. The pattern on 
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page 177, for example, shows that Test 2 depends only on 
factors B and D: and that is correct, for it was, in fact, 
their product BD, and L did not enter into it. The 
analysis gives the test score as a linear function of B and D, 


5360 + -701d 


whereas it was really a product. But the analysis was 
correct in omitting Z. Similarly, the analyses into the 
other factors can be compared with the actual formule, 
and in almost every case the factorial analysis, except for 
being linear, is in agreement with the actual facts. Tests 5 
and 6, true, appear in the analysis to omit factors L and D 
respectively, although these dimensions figured in their 
formule. But it would appear that they were swamped 
by reason of the other dimension in the formule being 
squared; and also possibly the specific and error factors 
we added did something towards obscuring smaller details. 
Also the process of “ guessing ° communalities, though 
innocuous in a battery of many tests, is a source of con- 
siderable inaccuracy when, as here, the tests are few. 

5. Bow dimensions as factors.—We can now explain the 
particular reason for selecting the primary factors, and not 
the reference vectors, as our fundamental entities. The 
fundamental entities in the present example can reason- 
ably be said to be the length, breadth, and depth of the 
boxes, given in the table on page 178. Now, the columns 
of that table are correlated with one another, as the reader 
can readily check, the correlation coefficients being— 


L with B, -589 
E „ D. 144 
B „ D, 204 


These correlations are due to the fact that a long box 
naturally tends to be large in all its dimensions. It could, 
of course, be very, very shallow, but usually it is deep and 
broad. 

The reference vectors were, it is true, correlated, but 
negatively. They were at obtuse angles with one another 
(see page 174) and obtuse angles have negative cosines 
corresponding to negative correlations. So the reference 
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vectors do not correspond to the fundamental dimensions 
length, breadth, and depth. 

What, then, are the angles—and hence the correlations— 
between the primary factors? We shall find that they 
are acute angles, and their cosines agree reasonably well 
with the above correlations between the length, breadth, 
and depth. The algebraic method of finding these angles 
is given in the mathematical appendix, but it is perhaps 
desirable to give a less technical account of it here. We 
need the direction-cosines of the primary factors, that is, 
the cosines of the angles they make with the orthogonal 
centroid axes. Each primary factor is the intersection 
of n — 1 hyperplanes—in our simple case is the intersection 
of two planes. 

In n-dimensional geometry a linear equation defines a 
hyperplane of n — 1 dimensions. For example, in a plane 
of two dimensions a linear equation is a line (of one dimen- 
sion)—hence the name linear. But in a space of three 
dimensions a “linear” equation like ax + by + cz =d 
is a plane. Two such equations define the line which is 
the intersection of two planes. i; 

Now, the equations of the three planes which form the 
triangular pyramid of which we have previously spoken 
are just those equations we have already obtained and 
used in our example, viz. : 

405 + -426y + -809z = 0 
464 + -076y — 8838 = 0 
3380 — -916y + 2153 = 0 


These equations taken two at a time define the three 
edges of the pyramid, which are our primary factors, and 
if we express each pair in the form— 


8 
c 


then the direction cosines are proportional to a, b, and c, 
which only require normalizing to be the direction cosines. 
When the direction cosines are found in this way, and 
written in columns to form a matrix, they prove to have 
the values— 
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| 
707 835 503 
400 187 — 843 (A“) 
453 —.517 192 


This is the rotating matrix to obtain the projections, i.e. 
the structure, on the primary factors, and if the centroid 
loadings on page 172 are post-multiplied by this there 
results the table we have already quoted on page 175. 

The above matrix, premultiplied by its transpose, gives 
the cosines of the angles between the primary factors. We 
obtain— 


1 506 4150 
-506 1 -164 |= DCD 
150 464 1 


Compare these with the correlations between the columns 
of dimensions of the boxes, viz. : 


1 3589 11 44 
589 1 204 
144 204 1 


The resemblance is quite good, and shows that it is the 
primary factors, and not the reference vectors, which 
represent those fundamental although correlated dimen- 
sions of length, breadth, and depth in the boxes. 

6. Criticisms of simple structure. AThurstone's argument 
is then, of course, that as this process of analysis leads to 
fundamental real entities in the case of the boxes (and 
also in his “trapezium” example, Thurstone, 19444. 
page 84, with four oblique factors), it may be presumed to 
give us fundamental entities when it is applied to mental 
measurements, And I confess that the argument is very 
strong. 

My fears or doubts arise from the possibility that the 
argument cannot legitimately be reversed in this way. 
There is no doubt that if artificial test scores are made up 
with a certain number of common factors, simple structure 
(oblique if necessary) can be reached and the factors 
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identified. But are there other ways in which the test 
scores could have been made? Spearman’s argument was 
a similar reversal. If test scores are made with only one 
common factor, then zero tetrad-differences result. But 
zero tetrad-differences can be approached as closely as we 
like by samples of a large number of small factors, with 
very few indeed common to all the tests. 

However, Thurstone’s simple structure is a much more 
complex phenomenon than Spearman’s hierarchical order, 
and yet he seems to have had no great difficulty in finding 
batteries of tests which give simple structure to a reason- 
able approximation. I am not sceptical, merely cautious, 
and admittedly much impressed by Thurstone’s ability 
both in the mathematical treatment and in the devising 
of experiments. 

Thurstone might, I think, put his case in this way. He 
assembles a battery of tests which to his psychological 
intuition appear to contain such and such psychological 
factors, some being memory tests, some numerical, etc., 

etc., no test, however, containing (to his mind) all these 
expected factors. He then submits their correlations to 
his calculations, reaches oblique simple structure, and 
compares this analysis with his psychological expectation. 
If there is agreement, ‘he feels confirmed both in his psy- 
chology and in the efficacy of his method of finding factors 
mathematically. Usually there will not be complete 
agreement, and he is led to modify his psychological ideas 
somewhat, in a certain direction. To test the truth of these 
further ideas he again makes and analyses a battery. 
Especially he looks to see if the same factors turn up in 
various batteries. He uses his analyses as guides to 
modifications of his psychological hypotheses, or as con- 
firmation of them. In Great Britain Thurstone's hypo- 
thesis of simple structure has been, I think it is correct to 
say, rather ignored than criticized. Most British psycho- 
logists have imbibed during their education a belief in and 
a partiality for “ Spearman’s g, a factor apparently 
abolished by Thurstone. Since his work on second-order 
factors rehabilitates g, this objection may disappear. 
Reyburn and Taylor of South Africa have, however, 


184 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 


criticized simple structure shrewdly (19434, and a later 
paper by Reyburn and Raath, 1949) even although they 
themselves do not insist on a g (see 19414, pages 253, 254, 
258). 

An early form of response to Thurstone’s work was to 
show that his batteries could also be analysed after Spear- 
man's fashion. Holzinger and Harman (1938), using the 
Bifactor method, reanalysed the data of Thurstone’s 
Primary Mental Abilities and found an important general 
factor due, as they truly say, “to our hypothesis of its 
existence and the essentially positive correlations through- 
out.” Spearman (1939a) in a paper entitled Thurstone’s 
Work Reworked reached much the same analysis, and raised 
certain practical or experimental objections, claiming that 
his g had merely been submerged in a sea of error. But 
there is more in it than that. As I said in my contribution 
to the Reading University Symposium (1939) Thurstone 
could correct all the blemishes pointed out by Spearman 
and would still be able to attain simple structure. I said 
on that occasion that however juries in America and in 
Britain might differ at present, the larger jury of the future 
would decide by noting whether Spearman’s or Thurstone’s 
system had proved most useful in the hands of the prac- 
tising psychologist. I now think that they will certainly 
also consider which set of factors has proved most invariant 
and most real. Very likely the two criteria may lead to 
the same verdict. But for the present the two rival claims 
are in the position described by the Scottish legal phrase, 
“taken ad avizandum.” 

7. Application of multiple-factor analysis to industrial test 
data.—Dr. R. Harper, with various co-workers, has applied 
these methods of factor analysis, begun in connexion with 
psychological tests, to tests of a physical kind on various 
substances during their manufacture. In Nature of 
November 20th, 1948, Harper and Baron wrote: “ In 
industrial physics there are occasions when empirical tests 
are employed the exact meaning of which is not fully under- 
stood, and where the interrelationships between the tests 
could profitably be studied by similar means” to those 
used in psychology, and they described a centroid analysis, 
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without rotation, of rheological measurements on cheese. 
In the British Journal of Applied Physics of January, 1950, 
Harper, Kent, and Blair gave an account of the factorial 
analysis of ten tests (seven rheological and three electrical) 
on a group of plastics (polyvinyl-chloride-plasticizer mixes). 
They made a centroid analysis, with four iterations, took 
out three factors, and rotated them orthogonally to 
maximize the number of near-zero loadings. They tried 
also other rotations, including one to an approximate 
oblique simple structure, and suggest interpretations of the 
factors arrived at. 


CHAPTER XIII 
SECOND-ORDER FACTORS 


1. A second-order general factor—The reason why the 
factors arrived at in the“ box example were correlated 
was that large boxes tend to have all their dimensions 
large. There is a typical shape for a box, often departed 
from, yet seldom to an extreme degree. Therefore the 
length, breadth, and depth of a series of boxes are corre- 
lated, and so also are Thurstone’s primary factors in such 
a case. There is a size factor in boxes, a general factor 
which does not appear as a first-order factor (those we 
have been dealing with) in Thurstone’s analysis, but 
causes these primary factors to be correlated. Possibly, 
therefore, when oblique factors appear in the factorial 
analysis of psychological tests, there is a hidden general 
factor causing the obliquity. This factor or factors (for 
there might be more than one) can be arrived at by analys- 
ing the first-order factors, into what Thurstone calls 
second-order factors, factors of the factors. 

Of course, whether such a procedure could be justified 
by the reliability of the original experimental data is very 
doubtful in most psychological experiments. The super- 
structure of theory and calculation raised upon those data 
is already, many would urge, perhaps rather top-heavy, and 
to add a second storey unwise. But we should not, I think, 
let this practical question deter us from examining what is 
undoubtedly a very interesting and illuminating suggestion, 
which may turn out to be the means of reconciling and 
integrating various theories of the structure of the mind. 

If we take the primary factors of our “ box ” example of 
Chapter XII, they were correlated as shown in this matrix : 


1 506 150 
506 1 164 
150 164. 1 
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If we analyse these in their turn into a general factor 
and specifies we obtain, using the formula— 


` Tarlac i 
g saturation -N) 


Thc 


the saturations of the primary factors with a second-order 
g as 680, 744, and 220 and each primary factor will 
also have a factor specific. We have now replaced the 
analysis of the original tests into three oblique factors by 
an analysis into four orthogonal factors, one of them 
general to the oblique factors and presumably also general 
to the original tests, though that we have still to inquire 
into. We must also inquire into the relationship of the 
specifics of the original tests to these second-order factors, 
which are no longer in the original three-dimensional 
common-factor space, but in a new space of four dimen- 
sions. Are the original test-specifies orthogonal to this 
new space ? 

With only three oblique factors, an analysis into one g 
is always possible (except in the Heywood case, which will 
often occur among oblique factors). If there had been 
four or more oblique factors, we would have had to use more 
second-order general factors unless the tetrad-differences 
were zero. Thurstone’s “ trapezium ” example already 
referred to had four oblique factors, and his article should 
be consulted by the interested. 

2. Its correlations with the tests Let us turn now to the 
question what the correlations are between the seven 
original tests and the above second-order g. To obtain 
these Thurstone uses an argument equivalent to the Tol- 
lowing : 

We may first note that each reference vector makes an 
acute angle with its own primary factor, but is at right 
angles to every other primary. factor, for these are all 
contained in the hyperplane to which it is orthogonal. 
The cosines of the angles can be obtained by premulti- 
plying the rotation matrix of the reference vectors by 
the transpose of the rotation matrix of thé primary 
factors, 
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DAA SD 
797 400 -458| |-405 -464 338 ‘860 
835 187 —-517 426 076 —-916 | =| . -858 | 


503 —-843 -192 809 —-883 215 „ 083 
These cosines in the diagonal of the matrix D give us the 
angles 81°, 31°, and 11° which we have already mentioned 
on page 177 as the angles between each primary factor and 
its own reference vector. 

Each row of the first of the above matrices represents 
the projections of the primary factor on to the orthogonal 
centroid axes. These are, in fact, the loadings of the prim- 
ary factors, thought of as imaginary or possible tests, 
in the orthogonal centroid factors I, II, and III. Following 
Thurstone, we add these three rows below the seven rows of 
our original seven real tests, extending the matrix F in 
length thus : 


449 — 682 165 211 
825 — 478 —-129 | -574 
906 336 020 787 
846 133 457 666 (wanted 
808 208 — 412 719 
697 336 335 597 
767 173 — 468 683 
797 400 453 680 
835 187 — 517 744 eon 
5503 — 843 192 220 


SSS SSO UNH 


This lengthened matrix we want to post-multiply by 
a column vector (Y in Thurstone’s notation) to give the 
correlations of the tests, including the imaginary tests 
L, B, and D, with the second-order g. In other words, we 
want to know by what weights each column must be mul- 
tiplied so that the weighted sum of each row is the correla- 
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tion of that test with g. Suppose these weights are u, v, 
and w. Since we already know from our second-order 
analysis what 7, is for each of the primaries L, B, and D, 
we have three equations for u, v, and w, the solution of 
which gives us their values. We have 


797% + 4000 + 45370 = 680 
*835u + 187 — 517w 744 
503% — 843 + 1920 220 
and these equations can be solved in the usual way, if 
the reader wishes. The values are 798, . 198, and —-077. 
A closer examination of them, however, which can be 
most readily expressed in matrix notation, leads to an 
easier plan—especially desirable if the number of primary 
factors were greater. In matrix form the above equations 
Are 


I 


T =r; 
whence ZP 
and since T is merely a short notation for DA we have— 
„ =(DA')", 
= AD=, 
That is to say, the centroid loadings F of the seven tests 
have to be post-multiplied by this, giving a matrix (a 
single column)— 
Fo = FAD”, 
But FA we already know. It is (see page 173) the simple 
structure V on the reference vectors. So we merely have 
to multiply the columns of V by Drg and add the rows to 
get the correlation of each test with g. These multipliers 
are, that is to say : 
-680 — -860 = 791 
744 — +858 = 867 
+220 + -983 = 224 
The results are the same as by the former method, except 
for discrepancies due to rounding off decimals, and are 
given to the right of the preceding table. 
3. A g plus an orthogonal simple structure. In his own 
examples, Thurstone has not calculated the loadings of the 
original tests with the other orthogonal second-order 
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factors, the factor specifies. This can, however, clearly be 
done by the same method as above. Since the correlations 
of the general factor with the three oblique factors are 
680, „744, and -220, the correlations of each factor specifie 
with its own oblique factor are -738, . 668, and -975. For 
example, 733 = 1 — 6802. The second-order analysis 
therefore is : 


680 783 i 
744 668 AEE 
-220 ; „ 975 


Dividing the rows by the divisors already mentioned, viz. 
860, 858, and -983, we obtain the matrix: 


| -791 853 ; 2 
807 8 779 MEEDE 
224 8 992 


and when the matrix V is post-multiplied by this we 
obtain the following analysis of the original seven tests 
into a general factor plus an orthogonal simple structure 
of three factors : 


General Factor plus Simple Structure 
G = VDE 
g a B è 


“211 021 009 805 
574 022 358 683 
787 449 333 — 006 
666 656 — 001 260 
719 071 5588 — 006 
597 593 041 000 
683 005 609 000 


The zero or very small entries in à, B, and & are in the 
same places as they are for L’, B’, and D’ in the oblique 
simple structure V (see page 173). What we have now 
done is to analyse the box data into four orthogonal 


IA arwnre 
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factors corresponding to size, and ratios of length, breadth, 
and depth. In terms of our pyramidal geometrical 
analogy we have “ taken out a general factor” by depress- 
ing the ceiling of our room, squashing the pyramid down 
until its three plane sides are at right angles to each other. 

The above structure, being on orthogonal factors, is also 
a pattern, so that the inner products of its rows ought to 
give the correlation coefficients with the same accuracy, if 
we have kept enough decimal places in our calculations, as 
do the rows of the centroid analysis F: and so they do. 
For example, the correlation between Tests 1 and 2 is, 
from F, 

-449 X -825 + -682 X -478 — 165 Xx 129 = 675 

and from G it is— 

211 X -574 + -021 x -022 + 009 x 858 +805 x +683 = -675 
The “experimental” value was 728, the difference of 
053 being due to the inaccuracy of the guessed com- 
munalities, or in an actual experimental set of data to 
sampling error and to the rank of the matrix not being 
exactly three. 

We can see here a distinct step towards a reconciliation 
between the analyses of the Spearman school and those 
of Thurstone using oblique factors. But we must not 
forget that if the oblique factors are not oblique enough, 
the Heywood embarrassment will occur, and a second- 
order g be impossible. The orthogonal factors of G are 
more convenient to work with statistically, but it is possible 
that the oblique factors of V are more realistic both in our 
artificial box example and in psychology. They corre- 
sponded in our case to the actual length, breadth, and 
depth of the boxes. The factors à, 6, and è of matrix G 
correspond to these dimensions after the boxes have all 
been equalized in “ size.” 


PART IV 
THE ESTIMATION OF FACTORS * 


* This use of the word “estimation” has been criticized. By 
statisticians the word is restricted to mean the estimation of un- 
known parameters from a sample, a process of inference from 
sample to parent population. Here the word is used to mean the 
“estimation” of a man’s scores in a test (or vocation or examina- 
tion) to which he has not been subjected, from a knowledge of his 
behaviour in other tests. Factors are imaginary tests and a man’s 
score in them can be “estimated” in the same way. I would use 
another word if I could, but estimation“ seems the natural ex- 
pression. Besides, I think the two meanings are fundamentally 
alike. 


F. A.—7 


CHAPTER XIV 
A REGRESSION AND MULTIPLE CORRELATION 


| 1. Correlation coefficient as estimation coefficient.—A corre- 
lation coefficient indicates the degree of resemblance 
between two lists of marks: and therefore it also indicates 
the confidence with which we can estimate a man’s position 
in one such list æ if we know his position in the other y. 
If the correlation between two lists is perfect (r = 1), 
we know that his standardized score* in the one list is 
exactly the same as in the other (w = y). 

If the correlation between the two lists is zero (r = 0), 
then the knowledge of a man’s position in the one list tells 
us nothing whatever about his position in the other list. 
If we are compelled to make an estimate of that, we can 
only fall back on our knowledge that most men are near 
the average and few men are very good or very bad in any 
quality. We have, therefore, most chance of being correct 
if we guess that this man is average in the unknown test. 
( =0. The average mark we have agreed to call zero; 
marks above average, positive; marks below average, 
negative.) 

In the first case, when r = 1, we are justified in equating 
his unknown score a to his known score y— 

8 

In the second case, when r = 0, we are compelled by our 

ignorance to take refuge in— 
æ = 0 or average. 
Both these statements can be summed up in the one 


statement— A 
* = ry 


where the circumflex mark over the a is meant to indicate 
that this is an estimated, not a measured, value. If, now, 

* A test score in what follows always means a standardized score 
unless the contrary is stated, But estimates are not in standard 


measure in general. : 
195 
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we consider a case between these, where the correlation is 
neither perfect nor zero, it can be shown that this equation 
still holds, provided each score is measured in standard 
deviation units. Since r is always a fraction, this means 
that we always estimate his unknown æ score as being 
nearer the average than his known y score. That is 
because we know that men tend to be average men. If 
this man’s y score is high, say— 
y=2 
(two standard deviations above the average), and if the 
correlation between the qualities œ and y is known to be 
r = 5, we guess his position in the æ test as being— 
O= ty = 5x 2=1 

ie. only one standard deviation above the average. This 
is a guess influenced by our two pieces of knowledge, 
(1) that he did very well in Test y, which is correlated with 
Test æ, and (2) that most men get round about an average 
score (zero). It is a compromise, an estimate. It will 
often be wrong; indeed, very seldom will it be exactly 
right. But it will be right on the average, it will as often 
be an underestimate as an overestimate, in each array 
of men who are alike in y. The correlation coefficient, 
then, is an estimation coefficient for tests measured in 


standard deviation units. 


2. Three tests Suppose now that we have three tests 
whose intercorrelations are known, and that a man’s scores 
on two of them, y and z, are known. We wish to estimate 
what his score will most probably be in the other test, æ. 
x need not be a test in the ordinary sense of the word, but 
may be an occupation for which the man is a candidate 


or entrant. According as we use his known y or his 


known z score, we shall have two estimates for his æ score. 
To fix our ideas, let us take definite values for the correla- 
tions, say : 
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The two estimates for his æ are then 


= -5z 


D&D 


and of these we shall have rather more confidence in the 
estimate associated with the higher correlation. But we 
ought to have still more confidence in an estimate derived 
from both y and z. Such an estimate could use not only 
the knowledge that y and z are correlated with a, but also 
the knowledge that they are correlated to an extent of 
r 3 with each other. Just to take the average of the 
above two separate estimates will not utilize this knowledge, 
nor will it utilize the fact that the estimate from y (r = 7) 
is more worthy of confidence than the estimate from 
„ 

What we want is to know how to combine the two scores 

y and into a weighted total— 

(by + cx) 

which will have the highest possible correlation with 2. 
Such a correlation of a best-weighted total with another 
test is called a multiple correlation. From such a weighted 
total of his two known scores we could then estimate thel 
man’s æ score more accurately than from either the y or 
the z score alone. It must use all the information we have, | 
including our information that y and z correlate to an 
amount r = ‘8. 

3. The straight sum and the pooling Square. In order to 
answer this question, we shall first consider the problem 
of finding the correlation of the straight unweighted sum 
of the scores y + z with æ. This is the simplest form of a 
problem to which a general answer was given by Professor 
Spearman (Spearman, 1913). ‘ 

We shall put his formula into a very simple form, which 
we may call a pooling square. In our present instance we 
want to find the correlation of y + z with æ (all of these 
being, we are assuming, measured in standard deviation 
units), We divide the matrix of correlations by lines 
separating the “ criterion” w from the “ battery -y ts 
thus : te 
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x y 2 
x 10 7 5 


68 1-0 3 
5 3 10 


a e 


In each of the quadrants of this pooling square (with 
unities in the diagonal, be it noted), we are going to form 
the sum of all the numbers, and we shall indicate these 
sums by the letters : 


A — — 


(where C is the sum of the Cross- correlations between the 
battery y + z and the criterion æ, which can be regarded 
as a second battery of one test only). 

Then the correlation of æ with y + z is equal to 


Cc 
v V 
which in our present example is— 
7+5 1-2 


VI) X MT 3T 341) V26° 
so that the battery (/ + z) has a rather better correlation 
(744) with æ than has either of its members (7 and -5). 
From the straight sum of the man’s scores in the two tests 
y and z we can therefore in this case get a better estimate 
of his score in æ than we could get from either alone. 

4. The pooling square with weights.—We want, however, 
to know whether a weighted sum of y and z will give a still 
higher combined correlation with œ. With sufficient 
patience, we could answer this by trial and error, for the 
pooling square enables us to find almost as easily the 
correlation of a weighted battery with the criterion.* Let 
us, for example, try the battery 3y + z. For this purpose 

* The pooling square can also be used to find the correlations or 
covariances of weighted batteries with one another. Elegant 


developments are Hotelling’s ideas of the most predictable criterion 
(1935a) and of vector correlation (1936). 
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we write the weights along both margins of the pooling 
square : 


5 
7 1 
1-0 | 7 5 
Pe me lta a 
1 5 3 1:0 


and multiply both the rows and the columns by these weights 
before forming the sums 4, B, and C. The result of the 
multiplications in our case is: 


e 1-0 | 2-6 
2-1 9-0 9 T 
5 9 26 | 118 
J 
and we therefore have— 
5 2-6 
correlation = = 757 
VI118 


a higher value than -744 given by the simple sum. So we 
have improved our estimation of the man’s æ score, and 
estimates made by taking 3y +z would correlate -757 
with the measured values of æ. 

5. Regression coefficients and multiple correlation.— 
Similarly we could try other weights for y and z and search 
by trial and error for the best. There is, however, a general 
answer to this question, namely that the best weights for 
y and z are proportional to certain minor determinants of 
the correlation matrix. The weight for y is proportional to 
the minor left when we cross out the criterion column and 
the y row, the weight for 2 is proportional to minus the 
minor left when we similarly cross out the criterion column 
and the z row. The matrix of correlations with the 
criterion column deleted being : 


cig 5 
1:0 3 
3 1:0 
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the weight for y is therefore proportional to: 


7 5 

3 10 =a od 
and that for z is proportional to : 

2 20 

10 3 | FRH 


that is, they are as +55 : 29. To make these weights not 
merely proportional but absolute values we must divide 
each of them by tħe minor left when the row and column 
concerned with the “ criterion ” æ are deleted, namely: 
1:0 3 
0 

so that these absolute best weights, for which the technical 
name is “ regression coefficients,” are— 

55 . 29 

—y + —z 

oi” 91 


or -6044y + 31872 


We are inviting the reader to take this method of calculat- 
ing the regression coefficients on trust ; -but he can at least 
satisfy himself that when applied to the pooling square they 
give a higher correlation of battery with criterion than any 
other weights do. The result of multiplying the y column 
and row by -6044, and the z column and row by -3187, is 
the following : 


= 291 


"6044 3187 
10% S” 160000 41 1593 
044 7 10 3| = 421 | 3653 0578 
3187 5 3 10 1593 0578 1015 
1 1:0000 | -5824 
; 38824 5824 
Multiple correlation = 75855 = 763 = fa say, which 


is higher than any other weighting will produce, if the reader 
cares to try others. Notice the peculiarity of the pooling 
Square with regression coefficients as weights, that C = B 
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(-5824 = +5824). We can deduce that the inner product of 
the regression coefficients with the correlation coefficients 
gives the square of the multiple correlation— 


604 X -7 + 319 X 5 = -583 = Tm? 


Indeed, we can take this as forming one reason for using 
.604 and +319, and not any other numbers proportional to 
them, although the latter would give the same order of 
merit. We want our estimates of æ not merely to be as 
highly correlated with the true values of œ as is possible, 
but also to be equal to them on the average in the long 
run, in the sense that our overestimations will, in each 
array of men who have the same y and z, be as numerous 
as our underestimations, and this is achieved by using not 
merely -55 and -29 as weights, but 55 + -91, and -29 91. 

6. Aitken’s method of pivotal condensation. When there 
are more than two tests y and z in the battery, the applica- 
tion of the above rules becomes increasingly laborious. It 
is desirable, therefore, to have a routine method of calcu- 
lating regression coefficients which will give the result as 
easily as possible even in the case of a team of many tests. 
The method we shall adopt (Aitken, 19374) is based upon 
the calculation of tetrads, as already used in our Chapter V. 
We shall first calculate the above regression coefficients 
again by this method. Delete the criterion column in the 
matrix of correlations, transfer the criterion row to the bottom, 
and write the resulting oblong matrix in the top left-hand 
corner of the sheet of calculations, preferably on paper 
ruled in squares : 


Check 
Column 
| @-0) 8 | -1 : 8 
A 8 1:0 | N 3 
7„ 5 3 | 12 
| | 
B (91) 3 — 2 
| Loo | 3297 —1-0989 | ꝛ 2308 
29 7 : 99 
0 604 319 | -923 
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On the right of the oblong matrix of correlation coeffi- 
cients we rule a middle block of columns of the same 
number, here two, and on the right of all a check column. 
The columns of the middle block we fill with a pattern 
of minus ones diagonally as shown, leaving the other cells 
empty,* including the bottom row. In the check column 
we write the sum of each row. The top left-hand number 
of all we mark as the “ pivot.” Slab B of the calculation 
is then formed from slab A by writing down, in order as 
they come, all the tetrad-differences of which the pivot in 
A is one corner. Thus the first row of slab B is calculated 
thus— 


1 * 1 — -8 xX 8 91 
ix 0—8x (-I) 38 
i xX (= I) = 8K 0. —1 
1 * 8 — 8 * 8 21 


and the row is checked by noting that -21 is the sum of the 
others. Immediately below this first row a second version 
of it is written, with every member divided by the first 
(91). This is to facilitate the calculation of slab C by 
having unity again as a pivot. The second row of slab B is 
then formed, beginning with— 


E 


Throughout the whole calculation, except for the division 
of the first row, only one operation needs to be performed, 
namely the computing of tetrad-differences, beginning with 
the pivot. 

The same operation is then repeated to give slab C, 
using the modified first row of B, with pivot unity. 

This procedure goes on, slab after slab, until no numbers 
remain in the left-hand block. There being only three 
tests in all in our example, this happens at slab C. The 
middle block then gives the regression coefficients -604 and 
319, with their proper signs, all ready for use. Throughout 
the calculation the check column detects any blunder in 
each 117 The check, let me repeat, for I often find this 
misunderstood, consists in seeing that the appropriate 


* The dots represent zeros, 
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tetrad from the sums in the previous slab agrees with the 
sum of the new row. Thus -99 is both the sum of its row, 
and also the tetrad— 


1X12—7x 8 
from slab A. 

When the number of tests in the battery is large, the 
calculation of the regression coefficients is a laborious 
business, but probably less so by this method than by 
any other. It will be clear to the reader that so long a 
calculation is not worth performing unless the accuracy of 
the original correlation coefficients is high. Only very 
accurate values can stand such repeated multiplication, 
ete., without giving untrustworthy results. In other 
words, regression coefficients have a rather high standard 
error.* 

7. A larger example Next we give in full the calculation 
of the regression coefficients in a slightly larger example, 
though one still much smaller than a practical scheme of 
vocational advice would involve. Here 20 is the “ occu- 
pation,” and 21, 22, 2, and 24 are tests. To give the 
example an air of reality, these and their intercorrelations 
are taken from Dr. W. P. Alexander’s experimental study, 
Intelligence, Concrete and Abstract (Alexander, 1935). 
They were : 


21 Stanford-Binet test; 

2, A picture- completion test; 

z, Thorndike reading test; 

z, Spearman’s analogies test in geometrical figures. 


* Regression weights obtained from one set of data, applied to a 
subsequent set, will not usually give a correlation with the criterion 
as high as that predicted. The probable defect in its square will 
be (Wherry, 1931)— 

(1 — M — % — M), 
where N is the number of persons and M the number of tests. 

+ In this, as in other instances where data for small examples are 
taken from experimental papers, neither criticism nor comment is 
in any way intended. Illustrations are restricted to few tests for 
economy of space and clearness of exposition, but in the experiments 
from which the data are taken many more tests are employed, and 
the purpose may be quite different from that of this book. 
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But the occupation is a pure invention, for purposes of this 
illustration only. The correlation matrix is : 


20 21 A 3 84 


Zo 1.00 72 63 58 41 


a 72 100 839 -69 49 
% 66 89 100 19 27 
23 358 669 19 1:00 38 


Za “41 “49 27 38 1-00 


The fact that we possess these correlations means that we 
have given these tests to a sufficiently large number of 
persons whose ability in the occupation is also known. 
The occupation ean be looked upon as another test, in 
which marks can be scored. In an actual experiment, 
obtaining marks for these persons’ abilities in the occupa- 
tion is in fact one of the most difficult parts of the work. 
We can now find by Aitken’s method the best weights for 
Tests 21 to z, to make their weighted sum correlate as 
highly as possible with 30. For a reason which will be 
explained later, I have numbered the tests in the order of 
their correlations with the criterion. To make the arith- 
metic as easy as possible to follow in an illustration, the 
original correlation coefficients are given to two places of 
decimals only, and only three places of decimals are kept 
at each stage of the calculation. The previous explanation 
ought to enable the reader to follow. As an additional 
help, take the explanation of the value -454 in the middle of 
slab C. It is obtained thus from slab B.— 


1 X -490 — -079 x 460 = -454 


and is typical of all the others. Except for the division 
of each first row, only one kind of operation is required 
through the whole calculation, which becomes quite 
mechanical. The numbers shown on the left in brackets 
are the reciprocals of -848, -517, -748, used as multipliers 
instead of dividing by the latter numbers, in obtaining the 
modified first rows. The process continues until the left- 
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hand block is empty, when the regression coefficients 
appear in the middle block.* 

The result is that we find that the best prediction of a 
man’s probable success in this occupation is given by the 
regression equation— 

20 = 39021 + 431z + 22223 + 0182, 
We give a candidate the four tests, reduce his scores 


COMPUTATION OF REGRESSION COEFFICIENTS 
Aitken’s Modified Method with Each Pivot converted to Unity 


Check 
| aj 39 69 49 —1 5 ; x 1-57 
30 1 19 27 í —1 f x 85 
4 69 19 1 38 A 5 —1 5 1-26 
49 27 38 1 : 5 : —1 114 
72 63 58 41 f 5 : : 2-34 
(1-179) (848) —.079 079| 390 —1 . 5 238 
1-000 —-093 0 460 —1:179 . : -281 
5 —.079 524 042 690 —1 A 177 
079 042 -760| 490 : n -371 
349 083 057) [720] - . : 1-209 
(1-936) (517) -049 726 —.093 —1 3 199 
1.0'c00 5096 1406 —.180 1.936 386 
c 049 753 454 093 —1 349 
116 025 559 412 . 1-112 

NA | — 
(1:337) (748) 384 102 095 —1 329 
$ 1.000 514 136 428 —13837| 441 
x 01 [ao] Cas) [2a] + 11-008 
A 3900 431 222 018 1'061 

Final Regression Coefficients 


* The product of all the unconverted pivots, 1 x 848 x -517 X 
748, is the value -328 of the determinant : 
100 39 -69 49 

39 100 -19 27 

69 19 100 38 

49 27 38 100 | 
If this alone were wanted, the middle block, and the criterion row, 
would, of course, be unnecessary. 
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to standard measure by dividing by the known standard 
deviation of each test, insert these standard scores into 
this equation, and obtain an estimated score for him in 
the occupation. Thus the following three young men 
could be placed in their probable order of efficiency in this 
occupation from their test scores : 


Standard Scores in 


z z a 5 2 
Dom 1 
Dick 1 3 “AT 
Harry 2 15 8 6 88 


| 

| 4 

| “1 “3 “3 “a “0 
| 

E 


The multiple correlation of such estimates 2) with the 
true values would be obtained by inserting the four 
correlation coefficients— 

72 63 58 “AL 
instead of the 2’s in the regression equation, and taking 
the square root, thus— 


V 390 X -72 + 431 X -63 + -222 x -58 + -018 x 41 
= 68847 = 7 
8 


m 
Finally, we can, as we did in the former example, use 

the regression weights on a pooling square and see if we 

obtain this same multiple correlation of r„ = 88: 


300 431 222 »018 
100 72 63 58 ‘4l 

| — . 
390 72 100 39 -69 49 
431 63 39 100 19 27 
222 58 69 19 1-00 38 
018 41 49 27 38 100 


It will be remembered that we have to multiply each 
row and column by its appropriate weight, and then sum 
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all the numbers in each quadrant. The easiest way of 

doing this in large pooling squares is to multiply the rows 

first, then add the columns and multiply the totals by the 

column weights, finally adding these products, thus : 
Multiply the rows : 


+390 431 222 018 


| 
1:0000 | 72 63 58 4¹ 
2808 | 3900 1521 2691 ‘1911 
2715 | 1681 4310 0819 1164 
1288 1532 0422 +2220 0844. 
0074. | 0088 0049 0068 0180 
| 
| 


Sums 6885 7201 6302 5798 4093 


If we had kept all decimals these columnar sums would, 
since we are using regression coefficients as weights, have 
been exactly equal to the top row. With the actual figures 
shown, on multiplying the column totals and adding them, 
we find that the pooling square condenses to : 

1:0000 | 6885 


6885 6885 
6885 
Tm = N.6885 


8. Using fewer tests.—There is a tendency, which com- 
mon sense finds natural, for the regression coefficients of the 
tests of a battery to be in the same order of magnitude as 
their correlations with the criterion. But this is not in- 
variably the case, and in the present example, if we com- 
pare the two sets— 

correlations with criterion 72 63 58 “41 

and regression coefficients. 390 431 222 018, 
we see that Test 2 has a higher regression coefficient than 
Test 1, although a lower criterion correlation. The reason 
lies in the high correlation of Test 1 with Test 3, 69. 
They measure to that extent the same thing, and when 


= +88 as before. 
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Test 3 is introduced into the battery it begins to some extent 
to put Test 1’s “ nose out of joint.” 

The boxed numbers in the calculation on page 205 are 
all regression coefficients. If only Test 1 is used, its 
regression coefficient is -72. If Tests 1 and 2 are used, 
their regression coefficients are +559 and -412. If Tests 1, 
2, and 3 are used, their regression coefficients are :397, 
483, and -224. And if all four Tests are used, the four 
final numbers are the regression coefficients. 

The addition of each test raises the multiple correlation 
Tmax: We have 


Test 1 72 X 72 5184 
Tests 1 and 2 72 X 559 ＋ -63 x 412 = 6622 
Tests 1, 2, and 3 72 x 397 + -68 x 433 + 58 x 224 
Tests 1, 2, 3, and 4 -72 x -390 + 63 x 431 + 58 x 222 

+ 41 x -018 


Although the addition of each test raises the multiple 
correlation, some do so only very little; and our caution 
in ordering the tests in accordance with the magnitude of 
the criterion correlation makes it probable, though not 
certain, that the comparatively useless tests will be the later 
ones. We can at each stage of the calculation pause and see 
whether the test we have just added makes a significant 
addition to the multiple correlation. We do this by an 
analysis of variance (see Lindquist, 1940, Chapter V, or 
other text-book). Consider, for example, the rise in the 
squared multiple correlation from -6622 to -6882. Is the 
rise statistically significant ? To decide this we must know 
the number of persons tested, say N = 105. 


| Degrees of | Mean 


Tests 2 io F 
5 7 PE, | Freedom | Square | Ratio F 
1 and 2 f 6622 2 | 
Increment on add- | | | | 
ing 3 . + | 70260 Ji 0260 ` -0260 + -0031 =7:7 


Residue 1 3118 101 | 0031 
| f 


Total . 10000 N 1104 
| 
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The calculation is carried out in the above form, and the 
decision whether the increment of 7 is statistically 
significant depends on the size of the ratio F. If it is large 
enough, the increase is significant. To decide how large, 
consult Table V in Fisher and Yates’s Statistical Tables, 
where we find that, with degrees of freedom 1 and 101, a 
ratio of 6-88 would be significant at the 1 per cent. point, 
i.e. quite highly significant, and 7-7 is even larger than this. 
So the increase due to the addition of Test 3 is well worth 
while. A similar calculation for the further addition of 
Test 4, producing a rise of -0003 in 7% Shows, as might be 
expected, that this is not significant, for F is now less than 
unity, and Tests 1, 2, and 3 are (with 105 cases) as good 
as the whole battery. 


— =i — me 
| Degrees of | Mean | 


EA | - 
Teats | Tma | Freedom Square N 
8 — 2 PS |S PAT 
1, 2, and 3 ` 6882 3 
Increment on add- | | 
ing 4 - 0003 1 0003 Less than unity. 
Residue 5 3115 100 0031 


Total 1.0000 104 


9. Calculation of a reciprocal matriv.—A somewhat longer 
method of calculating regression coefficients has two 
advantages: it permits the easy calculation of regression 
coefficients for any criterion (or many) when once the main 
part of the computation is completed, and, what is of great 
importance, it enables the standard errors of the coefficients, 
and of their differences, to be found quickly. 

The method referred to is to find first of all the reci- 
procal of the matrix of correlations of the tests. This is 
done by pivotal condensation also, as illustrated in the 
table overleaf. The matrix whose reciprocal is required 
appears in the top left-hand corner, with a diagonal array 
of minus ones on its right, and a diagonal of plus ones 
below it. The whole is condensed in the manner already 
described on page 205, and the required reciprocal matrix 
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and also that nearly half the numbers can be written down 
from symmetry. 

The regression coefficients for any criterion are then 
obtained by multiplying the rows of the reciprocal by the 
criterion correlations and then adding the columns. In 
the example of page 205 we multiply the first row of the 
reciprocal by -72, the second by 63, and so on. The 
addition of the columns then gives the same regression 
coefficients as were found on page 205. 

10. Variances and covariances of regression coefficients.— 
The most important advantage of this method is that 
whatever the criterion, the variances and covariances of the 
regressign coefficients are proportional to the cells of the 
above reciprocal matrix (Fisher, 1925, 15 and 1922, 611). 
This enables their absolute values for any given criterion 
to be obtained by multiplying by 1 — 7%,, (the defect of 
the square of the multiple correlation from unity), and 
dividing by the number of “degrees of freedom ” which 
is for full correlations N — p — 1 where N is the number 
of persons tested, and p the number of tests. For partial 
correlations the degrees of freedom are reduced by the 
number of variables“ partialled out.” 

Thus in our example, where p = 4, if N had been 105, 
N — p—1 would be 100. The multiple correlation was- 
.83, and 1 — 7?,, = 312 (see page 206). The variances and 
covariances of our four regression coefficients are in this 
case equal to the reciprocal matrix multiplied by 00312. 

0075 —-0017 —-0042 —-0016 
—.0017 -0038 0006 —-0004 
0042 -0006 -0061 + —-0004 
0016 0004. 0004. 0042 
The standard errors of the regression coefficients are the 
square roots of the diagonal elements : 
Regression coefficients 390 431 222 -018 
Standard errors 087 062 -078 065 
Significant ? Yes Yes NO 

The correlations of the regression coefficients will be got 
by dividing each row and column by the square root of 
the diagonal element. We obtain : 
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1:00 — 31 —-62 —-28 
— 381 * 1-00 42 —10 
— 62 12 1:00 — 08 
— 28 — 10 — 08 1-00 


We can now calculate the standard error of the difference 
between any pair of the regression coefficients and sce 
whether they differ significantly. Take, for example, those 
for Test 1 (390) and Test 2 (431). The difference is -041. 
Its standard error is the square root of 


0075 + -0038 + 2 x 31 x -087 x 062 = -0146 
.. standard error of -041 is -121 


The difference is therefore not significant when N = 105. 
Had N been larger it might have been. 

11. The geometrical picture of regression.— Before we close 
this chapter it will be illuminating to consider what re- 
gression and estimation mean in terms of the geometrical 
picture of Chapter VI. Consider the illustration used in 
the earlier pages of the present chapter, with the matrix : 


x y 2 


© | 1:0 7 5 
90 3 
8 5 3 1-0 


Here æ is the criterion, y and z are the tests. Each of 
them can be represented by a directed line, as explained in 
Chapter VI, with angles between these lines such that 
their cosines are the above correlations. The three lines 
will then be in an ordinary space of three dimensions. 

The two tests y and z themselves have, of course, lines 
which lie in a plane: any two lines springing from the 
same point as origin lie in a plane. The criterion line æ 
is not in this plane (say, the table top, on which we may 
imagine lines y and z to lie), but makes an angle with it. 
The problem of regression and multiple correlation is, in 
terms of this geometrical picture, to find the line in the plane 
of y and z which makes the smallest possible angle with the 
line æ: for the smallest possible angle corresponds to the 
largest possible correlation. Clearly this desired line is the 
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line which is the projection of the line w on to the yz plane, 
the shadow thrown by @ on the table with the sun right 
overhead. In Figure 29 it is the line OB, where B is verti- 
cally below a point A on the test line a. 

The regression coefficients are numbers which express 
the proportions in which the tests y and z have to be com- 
bined to give this line OB. It is just like the parallelogram 


< 
) 7 
a x wA 


Figure 29. 


eee? 
(. 


of forces. If from B we draw parallels to the two test lines, 
we obtain OY and OZ as the distances to be measured along 
the two test lines to give a resultant along OB, which is as 
near as we can come to OA. (No combination of y and 2 
can give a line out of their plane.) If the distance OA is 
taken as unity, the distances OY and OZ are the actual 
regression coefficients. Ifa wire model like Figure 29 were 
made with the proper angles with cosines a with y equal to 
-7, œ with z equal to ‘5, and y with z equal to -8, the distances 
OY and OZ would be found to be -6044 and -3187. And 
the cosine of the angle BOA would be . 763, the value we 
found for the multiple, or highest possible, correlation of 
the two-test battery with æ in Section 5 of this chapter, 
page 200. 

12. Estimation the same as projection.—Let us now con- 
sider a man P whose two scores in the Tests y and a we 
know, and whose probable score in Test æ we wish to 
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estimate. His two scores OM and ON in y and z enable 
us to assign to this man a point P on the yz plane, a point 
so chosen that its projections on to the y and z vectors 
give the scores made by him in those tests (see Figure 30). 
But we cannot say that this is his point in the three-dimensional 
space of x, y, and x. His point in that space may be any- 
where on a line at right angles to the plane yz. For 


p' 


Figure 30. 


from anywhere on that line, projections on to y and z fall 
on the points M and V. Yet the projection on to the 
vector œ, which gives his score in the criterion test d, 
depends very much on the position of his point on the line 
P“. All the people represented by points on that line 
have the same scores in y and z but different scores in 2, 
and our man may be any one of them. Before deciding 
what to do in these circumstances, let us consider this set 
of people P in more detail. 

It will be remembered that the whole population of 
persons is represented by a spherical swarm of points, 
crowded together most closely round about the origin O, 
and falling off in density equally in all directions from 
that point. Every test line is a diameter of this sphere, 
and the plane containing any two test vectors divides the 
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spherical swarm into equal hemispheres. It follows that 
a line like Y is a chord of the sphere at right angles to 
a diameter (the line OP), and consequently that it is 
peopled symmetrically on both sides of P, both upwards 
along PP’ in our figure, and downwards along PP", the 
men on the line being most crowded near the point P itself. 
The average man of the array of men P“ (who are all 
alike in their scores in the two tests y and 2) is therefore 
the man at P, and since we do not know exactly where 
our candidate’s point is along PPP“, we take refuge in 
guessing that he is the average man of his group and is at 
the point P itself. From P, therefore, we drop a perpen- 
dicular on to the vector æ, and take the distance OL as 
representing his estimated score in that test. This geo- 
metrical procedure corresponds exactly to the calculation 
we made, as a little solid trigonometry will show the 
mathematical reader. The non-mathematical reader must 
take it on trust, but the model may illuminate the calcula- 
tion. OL is the average of all the different scores à that a 
person with scores OM and ON can have. The estimate 
will only be certain if the line æ itself is on the table; it 
will be less and less certain, the more the line is inclined 
to the table. 

It should be noted that the angles which three test 
vectors make with each other are impossible angles, if the 
determinant of the matrix of correlations becomes negative. 
Ordinarily, that determinant is positive. In our present 
example we have, for example : 


190 eed 
r 10 8 | = 88 
5 83 10 


Such a determinant, however, though it cannot be 
negative, can be zero, namely in the cases where the two 
smaller angles exactly equal the largest. In that case the 
three vectors lie in one plane—the criterion line has 
sunk until it too lies on the table. In that case alone, 
when the determinant is zero, the“ estimation“ is certain, 
and all the people in the line P'PP” have not only the same 
scores in y and z, but also the same scores in æ. The 
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vanishing of the above determinant therefore shows that 
this is so. And in more than three dimensions, although 
we can no longer make a model, the vanishing of the 
determinant : 


| 1 701 702 Tos : Ton 
701 1 1 718 Tin 
Toz Tia 1 723 1 dais 
Tos 713 Tog 1 an 7 
Ton Tin Ton Tan 1 
shows that the criterion z} can be exactly estimated from 


the team z, za... 2%, In fact, the multiple correlation 
m Which we have already learned to calculate in another 
way, can also be calculated as— 


where A is the whole determinant, and Ago is the minor 
left after deleting the criterion row and column. This 
expression clearly becomes equal to unity when A = 0. 
In our small example a, y, z, we have— 


A = 88 Ao = 01 


Tm eH — 91 = = = 5824 = -763 
as we already know it to be from page 200. 
+J 13. The “ centroid” method and the pooling square.—The 
pooling square, which we have learned to use in this 
chapter, enables us to see in another light the nature of the 
factors first arrived at by the “ centroid ” method. 


Equal Weights 
8 1 Z2 ža Rs 
1 | 1 I Tig Tis Tha 
3 1 The Tis "ha 
Equal 22 | Tie Tis 1 723 Taa 
Weights z, Ts 18 Tog 1 T34 
ži Tia Tis 721 T34 1 
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Let us suppose that the tests 21, %, 23, and 24 have the 
correlations shown, and let us by the aid of a pooling square 
find the correlation of each of them with the average of all. 
This means giving each test an equal weight in pooling it. 

The correlation of 21 with the average of all is then 
obtained from the above pooling square (see previous page), 
which condenses to : 


| ae Beet 
| 1 | L+% ＋ 718 + Ta | 


| 


+ th | Sum of all the cells 
Hral of the table of corre- | 
erg lations. | 5 


and the correlation coefficient is— 


1 ＋ Me + M13 + M4 
Vabove sum 


This, however, is exactly the centroid or simple sum- 
mation process applied to a table with full communalities of 
unity. The first centroid factor obtained from such a table 
is simply for individua e of his four tes 
scores, and the method is called the “ centroid ° method, 
because “ centroid ” is the multi-dimensional name for an 
average (Vectors, Chapter III; and see Kelley, 1935, 59). 
The line in our geometri i yhich represents the 
first centroid factor, is in the midst of the radiating lines 
which represent the tests, like the stick of a half-opened 
umbrella among the ribs. It does not, however, make 
equal angles with the test lines unless these all make 
equal angles with each other. If several of them are 
clustered together, and the others spread more widely, 
the factor will lean nearer to the cluster. 

In the foregoing explanation the communalities have 
been taken as unity, and the factor axis was pictured in 
the midst of the test lines. If smaller communalities 
are used, the only difference is that a specific component 
of each test is discarded, and the first-factor axis must be 
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pictured as in the midst of the lines representing the other 
components of the tests. It can be shown that when 
communalities less than unity are used, if we bear in mind 
that the communal components of the tests are not then 
standardized, the pooling square gives the correlations 
exactly as before, if we use communalities instead of units 
in the diagonal. 

The first centroid factor is the average of the communal 
parts of the tests. 

The later factors in their turn are, in a sense, averages 
of the residues. There are, however, some complications, 
the first being that the average of the residues just as they 
stand is zero. The manner in which Thurstone circum- 
vents this has already been described in Chapter V. 

14. The most predictable criterion—Often a criterion is 
also composed of parts, just as a battery of tests is. If it is 
success in an occupation, the journeyman may be judged 
for skill, for regularity of attendance, for his manner in 
dealing with colleagues or customers, etc. Some of these 
items will consciously or unconsciously be weighted more 
heavily than others in an adjudicator’s assessment of the 
man; and so too in the assessment of a boy’s success in a 
secondary school. If the weights are thus decided by 
employer, or by headmaster, the criterion score becomes 
again one number, the sum of the arbitrarily weighted 
parts. 

Hotelling, however, raised and solved the question of 
how to weight the parts of a criterion so that it would 
correlate most highly with a given battery of tests, also 
weighted in its best way (Hotelling, 1935a, and see Thomson 
1947, 1948, and M. S. Bartlett, 1948). There are, then, 
indeed two weighted batteries. In terms of our geo- 
metrical analogy, the criterion is now no longer a line, as in 
Figures 29 and 30, but a space, and the problem is to find 
a line in the criterion space, and one in the battery space, 
which will be as near to each other as possible, both spring- 
ing from an origin O common to both spaces. This tech- 
nique, which the reader will find illustrated by an 
arithmetical example in Thomson (1947, 1948), would, for 
instance, enable weights to be given to the tests in two 
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different batteries to make these agree with one another as 
much as possible. 

15. Weighting for battery reliability—A special case 
arises when the two batteries are composed of alternative 
forms of the same tests, when the correlation between the 
two batteries is the battery reliability, which can be 
enhanced by suitable weighting. 

Thomson (1940) described how to find the best weights 
for battery reliability, as a special case of Hotelling’s 
most predictable criterion,” and Peel (1947) has given a 
simpler formula than Thomson’s (see page 353 in the 
Mathematical Appendix, Section 9a). If there are only 
two tests in the battery, with reliabilities ry, 722 and 
correlating with one another r, then Peel’s formula gives 
as the maximum attainable reliability the largest root p of 
the equation. 


Ty — & 712 (1 — u) Z0 
| ry (1 — y) 722 — Us 


that is n.1 — i) — lu 722 — 2712) + Cue: — 712). == 0. 
If, for example, 712 = ‘5, 71 = 7, and Ta = 8, the quadratic 
has roots -843 and -490, and a battery reliability of +843 
is attainable by using weights proportional to either row of 
the above determinant with u = -843, taken reversed and 
with alternate signs, that is 0785 and -1431 

or +0481 and -0785 

or 1 and 1:8 approximately. 
If as a check we set out a pooling square for the two bat- 
teries it will be— 


1 1:8 1 1:8 

| 

Sni 5 7 5 

1:8 5 10 5 87 
1 | 7 5 1-0 5 
18 | 5 8 5 10 


and if we multiply the rows and columns by the weights 
shown, and add together the quadrants, this reduces to— 
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6-04 | 5-092 
5-092 6-04 4 

giving a battery self-correlation or reliability of 


5:00? = ‘848 as ted 
a. as expected. 

When there are more than two tests, the solution of the 
above determinantal equation becomes laborious and diffi- 
cult. Green (1950) has given a transformation of the 
equation which enables an iterative process to be used in 
its solution, making it more practicable (see the Mathe- 
matical Appendix, page 353). 

Clearly the weights making a battery as reliable as 
possible will not be the same as those making it most valid 
in predicting a given criterion. There is here a conflict 
of aims, for we want a battery to be both as valid and as 
reliable as possible. It is very desirable that some reason- 
ably simple form of calculation should be devised to find 
those weights which should be given to the tests of a battery 
which, for a given criterion, would make the best com- 
promise, making reliability equal to validity and both as 
great as possible (see Thomson 1940, pages 864 to 365). 


CHAPTER XV 
THE ESTIMATION OF A MAN’S FACTORS 


V1. Estimating a man’s “ g.”—So far, our discussion of 
estimation in Chapter XIV has had nothing immediate to 
do with factorial analysis. We are next, however, going 
to apply these principles of estimation to the problem of 
estimating a man’s factors, given his test scores. As we 
have already explained in Chapter V IL there is no need to 


“ estimate” factors when unity is retained in each diagonal 
cell; they can be calculated without any loss of exactness 
because they are equal in nu tests: and even 
if we analyse out only a few of them, they can be exactly 
calculated for a man from his test scores. When we say 
ewactly here, we mean that the factors are known with the 
same exactness as the test scores which are our data. 
When communalities are used, however, factors are 
more numerous than the tests, and can therefore only be 
“estimated.” Two men with the same set of test scores 
may have different factors. All we can do is to estimate 
them, and since the test scores of the two men are the 
same, our estimates of their most probable factors will 
be the same. The problem does not differ essentially 
from the estimation of occupational success or of ability in 
any “ criterion ” test. The loadings of a factor in each 
test give the 20 row and column of the correlation matrix. 
Let us first consider the case of a hierarchical battery of 
tests, and the estimation of g, taking for our example 
the first four tests of the Spearman battery used as illustra- 
tion in Chapter I, with these correlations : 


21 22 23 2 


1 1.0 72 686 54 
72 100 56 48 
66 56 100 42 
2 54 48 42 1:00 
221 
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These correspond, in the analogy with the ordinary cases 
of estimation of the first chapter of this part, to the tests 
given to a candidate. In those cases, however, there was 
a real criterion whose correlations with the team of tests 
were known, and formed the zọ row and column of the 
matrix. Here the “criterion ” is and it cannot be 
measured de y; it can only be estimated in the manner 
we are now about to describe. We have here, therefore, 
no row and column of experimentally measured correlations 
r (Thomson, 
B. J. P. 25, 94). From the hierarchical matrix of inter- 
correlations of the tests, however, we can calculate the 
“ saturation ” or loading“ of each test with the hypo- 
thetical g, and use these for our criterion column and row 
of correlations. These saturations are the correlation co- 
efficients which would be found between each test and a test 
of pure g with no specifie. We thus arrive at the matrix : 


Zo 2 22 23 Z4 


20 | L00 90 80 7 -60 
% | 90 100 72 63 54 
22 80 72 100 56 48 
322 70 66 56 41:00 42 
z 60 54 48 42 1:00 


and we want to know the best-weighted combination of 
the test scores 21 to z4 in order to correlate most highly 
with 20 =g. The problem is now the same as one of 
ordinary estimation of ability in an occupation, and the 
mathematical answer is the same. We can, for example, 
use Aitken’s method of finding the regression coefficients, 
although in this case, because of the hierarchical qualities 
of the matrix, there is, as we shall shortly see, an easier 
method. It is, however, illuminating for the student 
actually to work out the regression coefficients as in an 
ordinary case of estimation, as shown on the next page. 

If, therefore, we know the scores %, 22, 23, and 2, which 
a man has made in these four tests, we can estimate his g 
by the equation— 


z ê = 353121 + 25952 + 160283 + 109524 


— — r —ę— ũ — — — — 
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(1.00) 72 -63 -54 |—1-00 : ; 189 
72 100 56 48 0 3 176 
66 56 100 42 : T === 1-00 161 
54 48 42 1-00 5 5 . 2100 1.44 
90 80 70 60 i ; : 3000 

(2-0764)(-4816) 1064 0912 72 —1˙00 3992 

1-0000 2209 -1894| 1:495 2.0764 3 8289 

1064 „6031 -0798| 63 1 ; “4193 

0912 +0798 784 54 ; 21.00 | -4190 

1820 1330 -1140| 90 y 2 1.2994 

(4.7253) (57900) 0896 4709 2209 —1-00 : 3311 

1:0000 102 8124 8811 1.7253 ; “5712 

0597 -6911| 4037 1894 . 1:00 8438 

0994 -0852| 6728 3156 . 1.1730 
. i 

(1:4599) (6850) 3552 1666 4030 —1-00 3097 

1.0000 -5186 2432 1504 —1:4599| -4521 

0750 5920 2777 41715 . 1.1162 

“5581 25095 1602 41095 1˙0823 


| Regression Coefficients 


The multiple correlation of such estimates in a large 
number of cases with the true values of g will be by analogy 
with our former case given by— 

7 = 5531 X ‘90 + 2595 X ‘80 

+ 1602 X 70 + 1095 x -60 = :888 
Tm = 940 : 

We must remember, however, that such a-correlation here 
is rather a fiction. We had in the former case the possi- 
bility of comparing our estimates with the candidate's 
eventual performance in the occupation or criterion zo. 
Here we have no way of knowing g; we » only have the 
estimates. 

As before, we can check the whole calculation by a 
pooling square (see page 200). 

Estimating g from a hierarchical battery is therefore, 
mathematically, exactly the same problem as estimating 
any criterion, and can be done arithmetically in the same 
way. Because of the special nature of the hierarchical 
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matrix of correlations, however, with its zero tetrad- 
pie, there is an easier way of calculating the estima 


of g, due to Professor Spearman himself (Abilitic $, xviii). 

or its equivalence mathematically to the above see 
Appendix, paragraph 10. 

Meanwhile we shall illustrate it by an example which 


will at least show that it is equivalent in this instance. 
The calculation is best carried out in tabular form, and is 


Pased 1 entirel y on the saturations or loadings of the tests 
with g, which are also their correlations with g. 


| | . Regression 
175 | Tig Coefficients 
Test Ty 7755 * Tu”) 2 7 r 
| | 
1 981 19 4.2032 4.7808 
2 8 “64 36 1:7778 | 2-2299 
3 7 40 51 9608 | 1:3725 
4 6 36 64 -5625 | 9375 4005 
S = 75643 
1+ S = 8:5643 
— = ‘1168 
1＋ 8 


The quantity S is of some importance in this formula. It 
is formed in the fourth column of the table, from which 
it will be seen that— í 
8 z 727 5 
5 
It is clear that S will become larger and larger as the 
number of tests is increased. 


Now, we saw that the square of the multiple correlation 


Tm is obtained when we multiply each of the weights s by 2 Tig 
and sum the products. That is to say— 
Z Tm? = (Weight X saturation) 


1 Ti 
P 2 ; 
Cae 0 
7 1 Tig? 8 


1＋ te er, 


The result, with much less calculation, is the same. 
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This fracti ill be the nearer to unity, the larger & is; 


more (hierarchical) tests to the team. Thus in theory we 
can make a team to give as high a multiple correlation 
with g as we desire. It will also be noticed, however, 


from our table that the tests with high | g saturation make 


much the largest contribution to &. and therefore to the 


multiple correlation.. 

2. Estimating two factors simultaneously.—We have seen 
in the preceding section how to estimate a man’s g from 
his scores in a hierarchical team of tests, and in this we 
shall consider the broader question of estimating factors in 
general. Thus in Chapter V the four tests with corre- 


lations : 


1 2 3 4 


1 3 
2 4 UP oy 8 
C 
4122 etiiiaw 


were analysed into two common factors and four specifies 
with the loadings (see Chapter V, page 79). 


Common Factors 
I II Specific Factors 
1 5164 s | 8563 
2 | ‘7746 3162 ; 5477 
3 “7746 3162 : ` 5477 
4 | 3873 : | š : +9220 


Any one column of these loadings can be used as the 
criterion row in the calculation by Aitken’s method, and 
the regression coefficients calculated with which to weight 
a man’s test scores in order to estimate that factor for 
him. If, as is probable, we want to estimate both common 
factors, we can do the two calculations together, as shown 
on the next page. Both arrays of loadings are written 
below the matrix of intercorrelations, and then pivotal 
condensation automatically gives both sets of regression 
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coefficients, with only one extra row in each slab of the 
calculation: 


(1-0) 4 4 2 10 . 7 i 1-0 
4 10 7 3 F —1-0 : 
SERA ee 4 =1-0 ‘ 
2 3 3 10 | i z $ —10 
5164 7746 "7746 -3873 | : 
E 3162 -3162 . 5 5 
(84) 54 22 40 —10 5 5 1-0 
1-00 6420 "2619 | "#762 —1-1905 5 « 1-1905 
54 84 22 40 8 —10 1-0 
*22 22 96 2⁰ Fy —1-0 6 
5680 +5680 2840 5164 é i A 1:9865 
3162 3162 ‘ n $ 6324 
(4928) -0786 1429 6429 —1-0 8 357¹ 
1.0000 1595 2000 1:3046 —2-0292 . 7246 
0786 9024 0952 2619 s —1:0000 3381 
2028 1352 2459 6762 s 8 1-2603 
"1129—0828 | —-1506 3764 ` $ 2560 


(8899 0724 13594 41594 10000 2811 
1.0000 »0814 1791 1791 11237 3159 
1029 1871 4116 4116 ; 1-1134 
—:1008 —.1833 2291 2291 ; -1742 


1787 3932 3932 1156 10809 
—1751 2472 2472 — 1133 2060 
Regression Coefficients 


If, therefore, we have a man's scores (in standard 
Measure) in these four tests, our estimate of his Factor I 
will be— : 

17871 + 39322. + -8982z, + -1156z, 


and estimates made in this way will have a multiple 
correlation 7,, with the “ true ” values of the factor, in a 
number of different candidates, given by— 


Mm: = 1787 X -5164 + -8982 X -7746 +. -3932 x -7746 
+ 1156 x 3873 = -7462 
* Tm = 1864 
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Similarly, the multiple correlation of the estimate of the 
second factor with the “ true“ values can be found to be 


75 = 895 


The two factors are not, therefore, estimated with equal 
accuracy by the team. As before, the whole calculation 
can be checked by a pooling square. 

We have now found the regression equations for esti- 
mating the two common factors by treating each in turn 
as a “criterion.” It is also possible to estimate a man’s 
specific factors in the same way. Indeed, we might have 
written the loadings of the specific factors as four more 
rows below the common-factor loadings in the first slab 
and calculated their regression coefficients all in the one 
calculation. But it is easier to obtain the estimate of a 
man’s specific by subtraction (compare Abilities, 1982 
edition, page xviii, line 10). For example, we know that 
the second test score is made up as follows— 


Za 774% + 3162½ . 54778 


where f, and f, are the man's common factors and s, his 
specific. We have estimated his fı and ., and we know 
his 22; so we can estimate his s from this equation. The 
estimates of all a man’s factors, to be consistent with the 
experimental data, must satisfy this equation and similar 
equations for the other tests. If the estimate of the 
specific is actually made by a regression equation, just like 
the other factors, it will be found to satisfy this require- 
ment.* From the estimates of all a man’s factors, there- 
fore, including any specifics, we can reconstruct his scores 
in the tests exactly. From only a few factors, however, 
even from all the common factors, we cannot reproduce 
the scores exactly, but only approximately. 

3. An arithmetical short cut (Ledermann, 19384, 1939b).— 
If the number of tests is appreciably greater than the 
number of common factors, the following scheme for 


* It is interesting to note that we know the best relative loadings 
of the tests to estimate a specifie by regression without needing to 
know how many common factors there are, or whether indeed any 
specific exists or not. (Wilson, 1984. For the same fact in more 
familiar notation, see Thomson, 19364, 43.) 
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computing the regression coefficients will involve less 
arithmetical labour than the general formule expounded 
in Chapter XIV and applied to the factor problem in this 
chapter.* 

For illustration, we shall use the data of the preceding 
section (page 225), although in that example the number 
of tests (four) exceeds the number of common factors (two) 
only by two, which is too small an amount to demonstrate 
fully the advantages of the present method. The common- 
factor loadings and the specifics of the four tests form a 
4 X 2 matrix and a 4 X 4 matrix respectively, thus: 


5164 8 


| 85638 

M, =| ‘7746 8162) „ yy, — 1 | 
7746 3162 3477 
3873 TAN -9220 | 


the matrix M, being identical with the first two columns, 
and the matrix M, with the last four columns of the table 
on page 225. Before the data are subjected to the com- 
putational routine process, which will again consist in the 
pivotal condensation of a certain array of numbers, some 
preliminary steps have to be taken : (i) the loadings of 
each test are divided by the square of its specific, and the 
modified values are then listed in a new 4 X 2 matrix : 


7042 


ur. 25820 10540 
a 2.5820 10540 
| 4556 
e.g. 2:5820 = (7740) ＋ (:5477)? 


1:0540 = (-3162) ~ (5477) 
(ii) Next, the inner products (see footnote on page 74) of 
every column of M, in turn with every column of Moi are 
calculated and arranged in a 2 x 2 matrix: 


* This short cut, in the form here given, is only applicable to 
orthogonal factors. For oblique factors, which are described in 
Chapter XII, modifications are necessary in Ledermann’s formule, 
for which see Thomson (1949) and the later part of Section 19 of the 
Mathematical Appendix, page 865. 
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i 4:5401 1-6329 
Mov 9 3 ond 

If there had been r common factors the matrix J would 
have been an r X r matrix. The arithmetic is simplified 
by the fact that J is always symmetrical about its diagonal, 
so that only the entries on and above (below) the diagonal 
need be calculated. (iii) Finally, each element on the 
diagonal of J is augmented by unity, giving, in the notation 
of matrix calculus, the matrix : 


55401 16329 
3 | 1:6829 1-6665 | 

This matrix is now “ bordered ” below by the matrix 
Mo, and on the right-hand side by a block of minus ones 
and zeros in the usual way. The process of pivotal 
condensation then yields the same regression coefficients 


as were obtained on page 226. 


5:5401 1:6329 —1-0000 . 6:1780 
1:0000 2947 — 1805 s 1:1142 
1-6329 1:6665 > —1-0000 2:2994 
7042 . 7042 
2:5820 1:0540 3-6360 
2:5820 1:0540 3-6360 
4556 . 4556 
1-1853 2947 —1-0000 4800 

1-0000 2486 —'8437 | 4050 

— 2075 1271 — +0804 

2931 4661 7591 

2931 4661 7591 

— 1343 0822 — 0520 

1787 — 1751 0036 

Regression Coefficients 3932 2473 6404 
3932 247 6404. 

1156 —'1133 0023 


v4. Reproducing the original scores.—Let us imagine a 
man who in each of the four tests in our example obtains 
a score of +1; that is, one standard deviation above the 
average. We choose this set of scores merely to make the 
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With this table of loadings in our possession we might 
have given vocational advice to a man in a roundabout 
way. (Instead of inserting his scores in Zis Zæ Z% and z% in 
the equation for %, we might have estimated his factors 
g, v, and F from his scores in the four tests, and then 
inserted these estimated factors in the specification equa- 
tion of the oceupation— 


7% = 55g + 48 + -60F ＋ . 3760 


(ignoring the specific so, which cannot be estimated from 
Zo 227 2%, and z4). Had we done so, we should have arrived 
at ewactly the same numerical estimate of his z as by the 
direct method (Thomson, 1936a, 49 and 50).) 

The actual estimation of the factors g, v, and F from 
the four tests will form a good arithmetical exercise for the 
student. The beginning and end of the calculation of the 
regression coefficients is shown here, following exactly 
the lines of the smaller example on page 226 of this chapter : 


Check 

1-00 39 69 49 | — 1 1:57 
39 1-00 19 27 — 1 126 
69 19 1-00 38 — 1 1-14 
49 27 38 1-00 | —1 85 
66 37 52 74 2:29 
52 Fi 66 al < 3 5 š 1-18 
21 71 8 f | 8 ` A : 92 


This reduces by pivotal condensation step by step to the 
three sets of regression coefficients : 


for ĝ 300 -095 095 -532 | 
for 6 353 — 133 -581 — 332 | 
for Ê 121 747 — 148 — -206 | 


The result is to give us three equations for estimating 
8, v, and F from a man’s scores in the four tests, viz.— 


Ê = 30021 + -095z, + -095z, + -5322, 
6 = 35821 — -1532 + -5812 — -352z, 
Ê = 1212 + -7472, — “1482, — -206z, 


Wow let us assume a set of scores l, %, 23, Z4 for a man, 
and see what the estimate of his occupational ability is by 
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the two methods, the one direct without using factors, the 
other by way of factors. Suppose his four scores are— 


21 85 aa ees 
2 6 —4 7 
The estimates of his factors g, v, and F will therefore be— 


@ = 300 X -2 + 095 x +6 + 095 x (— 4) + -582 x 7 ! 
853 x 2 — 153 x 6 + “581 x (— % — 852 x -7 = — -500 
f = 121 x 2 + 747 X 6 — 148 x (— 4) — 206 x Y= 387 

If now we insert these estimates of his factors into 


the specification equation of the occupation, ignoring its 
specific, we get for our estimate of his occupational success : 

2, = -55 X -451 + -45 x (— 500) + -60 x 887 = :255 
that is, we estimate that he will be about a quarter of a 
standard deviation better than the average workman. 
This by the indirect method using factors. 

By the direct method, without using factors at all, we 
simply insert his test scores into the equation— 

2. 39021 + -4312z, + "222z, + ‘0182, 


and obtain— 
2 = 890 X -2 + -4B1 x -6 + -222 x (— 4) + O18 K 7 
= 260 

exactly the same estimate as before—for the difference in 
the third decimal place is entirely due to “ rounding off“ 
during the calculations. The third decimal place of the 
direct calculation is more likely to be correct, since it is 
so much shorter. 

2%. Why, then, use factors at all?—The reader may now 
ask, What, then, is the use of estimating a man’s factors 
at all?“ Well, in a case analogous to that of the present 
example it is quite unnecessary to use factors at all, and 
there is no doubt that a great many experimenters have 
rushed to factorial analysis with quite unjustifiable hopes 
of somehow getting more out of it than ordinary methods 
of vocational and educational advice can give without 
mentioning factors. But we must not go to the other 
extreme and “ throw out the baby with the bath-water.” 
There may be other reasons for using factors, apart from 
vocational advice. And even in giving such advice, which 
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tests and occupations into factors, still more the calculation 
of quantitative estimates of these factors, are as yet very 
inaccurate, and perhaps are inherently subject to uncer- 
tainty. A fluctuating and doubtful coinage can be a 
positive hindrance to trade, and barter may be preferable — 
in such circumstances 

We showed in Section 5 above that{a direct regression 
estimate of a man’s ability in an occupation gives ident ically 
the same result as an estimate via the roundabout path of 
factors, so that at least when the direct regression estimate 
is possible there can be no quantitative advantage in using 
factors.) When, however, is the direct regression estimate 
possible, and when is it impossible ? ý 

To make the direct regression estimate we require the 
complete table of correlations of the tests with one another 
and with the occupation, and we have to know the candidate’s 
scores in the tests. This implies that these same tests have 
been given to a number of workers whose proficiency in the 
occupation is known, for otherwise we would not know the 
correlations of the tests with the occupation. Under these 
ideal circumstances any talk of factors is certainly unneces- 
sary so far as obtaining a quantitative estimate is concerned. 

But suppose these ideal conditions do not hold! These 
tests which we have given to the candidate have never 
been given, at any rate as a battery, to workers in the 
occupation, and their correlations with the occupation are 
unknown! This situation is particularly likely to arise 
in vocational advice or guidance as distinguished from 
vocational selection. In the latter we are, usually on 
behalf of the employer, selecting men for a particular job, 
and we are practically certain to have tried our tests on 
people already in the job, and to be in a position to make 
a direct estimation without factors. But in vocational 
guidance we wish to gauge the young person’s ability in 
very many occupations, and it is unlikely that just this 
battery of tests that we are using has been given to workers 
in all these different jobs. In that case we cannot make 
a direct regression estimate of our candidate’s probable 
proficiency in every occupation. Can we, then, obtain an 
estimate in any other way ? 
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Other ways are conceivable, but it must at the outset 
be emphasized that they are bound to be less accurate than 
the direct estimate without factors. Although this battery 
of tests has not been given to workers in the occupation, 
perhaps other tests have, and by the aid of that other 
battery a factor analysis of the occupation has perhaps 
been made. If our tests enable the same factors to be 
estimated, we can gauge the man’s factors and thence 
indirectly his occupational proficiency. Unfortunately, 
the “if” is a rather big one)] Are factors obtained by 
the analysis of different batteriés of tests the same factors ; 
may they not be different even though given the same 
name? We shall discuss this very important point later, 
but meanwhile/let us suppose that we have reasonable 
confidence in the identity of factors called by the same 
name by different workers with different batteries.) Then 
the probable course of events would be something like this. 
An experimenter, using whatever tests he thinks practicable 
and suitable, analyses an occupation into factors. Another 
experimenter, at a different time and place, is asked to 
give advice to a candidate for that occupation. Using 
whatever tests he in his turn has available, he assesses in 
this candidate the factors which the previous experimenter’s 
work leads him to think are necessary in the occupation, 
and gives his advice accordingly. The factors have played 
their part as a go-between, like a coihage. All depends on 
the confidence we have in the identity of the factors.) We 
shall see later that there is only too much reason to think 
that the possibility of this confidence being misplaced has 
hardly been sufficiently realized by many over-enthusiastic 
lactorists. And even if the common factors are identical, 
there remains the danger that the specific“ of the occu- 
pation may be correlated with some of the “ specifies ” 
of the tests, a fact which cannot be known unless the same 
tests have been given to workers in the occupation. 

7. Calculation of correlation between estimates. Me said 
above that even although we make our analysis of the tests 
we use into uncorrelated factors, the estimates of these 
factors will be correlated, if we use communalities and thus 
have more factors than tests. Arithmetically, these 
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correlations are easily calculated from the inner products 
of (b), the loadings of the estimated factors with the tests 
(page 282), with (a), the loadings of the tests with the 
factors (page 231). 

The matrix of loadings of the four tests with the three 
common factors is (page 231) : 


| 66 52 21 


M 52 66 


and the matrix of the loadings of the three estimated 
factors with the four tests is (page 232) : 


300 095 095 532 
N = 353 — 153 ‘581 — 352 
121 747 — 148 —-206 
Then the matrix of variances and covariances of the 
estimated factors is— 
K =NM 
Performing the matrix multiplications as explained in 
Chapter X, Section 4, page 145, we obtain: 


800 095 098 5322 66 52 21 


NI = 358 —.153 581 —-352 | 37 . 
121 747 —-148 —-206 | | 52 66 
a 1414 


676 219 130 
= 218 »567 —-084 
| 127 —-085 -556 | 


If our arithmetic throughout the whole calculation of 
these loadings had been perfectly accurate, the matrix K 
would have been perfectly symmetrical about its diagonal. 
The actual discrepancies (as -127 and 130) are a measure 
of the degree of arithmetical accuracy attained. 

The matrix K thus arrived at gives by its diagonal 
elements -676, -567, and 556, the variances of the three 


— 


— > = 
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estimated factors (that is, the squares of their standard 
deviations), and by its other elements their covariances in 
pairs (that is, their overlap with one another). The 
correlation of any two estimated factors is equal to (see 
Chapter I, Figure 2)— 

£ R covariance (ij) 

V warianee (i) & variance (j) 

From K we can therefore form the matrix of correlation 

of the estimated factors. It is: 


| 1-000 353 212 
353 1000 — 061 
212 —-061 1:000 

wherein +358, for example, is 219 + 676 X 567). 

Although, therefore, the “ true” factors g and v are un- 

correlated, their estimates g and @ are correlated to an 

amount -358. The“ true ” factors g, v, and F are in standard 
measure, but their estimates g, ő, and F have variances of 
only -676, . 567, and -556 instead of unity. These variances, 
be it noted in passing, are equal also to the squares of the 

correlations between g and g, v and , F and Ê. 

Not only are the estimates of the common factors 
correlated among themselves; they are correlated with 
the specifics, so that the estimates of the specifics are not 
strictly specific. As a numerical illustration we may take 
the hierarchical matrix used in Section 1, pages 221 f. 


| % 22 23 Za 


21 1.00 72 63 54 
z% 72 100 56 48 
z 63 56 1:00 42 
z 54 48 42 100 
The regression estimate of g from this battery is, as we 
found on page 223)— 
9 = -558z, + 25922 + 16025 + 1095, 
The regression estimates for the four specifies can also 
be found, either by a full calculation like that of page 
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226, or by the simpler method of subtraction of page 
227. Thus, to estimate s, in our present example we 


know that— 
a = Og + VI — 95 61 
= 99 + 43661 
Also we know that the estimates h and Si will satisfy the 
same equation— 
% = 06 + +4368, 
that is— 


On inserting the expression for g into this we get 
8 1.1522 — 585%, — +8882, — -225z, 
and similarly— 


4 = — 78T% + 1:3137, — 2152 — 1452, 
éz = — +5422, — 25322 + 1-2422, — 10624 
§&, = — -415z, — 1942 — 121z, + 1-1692, 


We have now both N, the matrix of loadings of the 
estimated factors ĝ, S, s» 8, 8, with the four tests, and 
M, which we already know, the matrix of loadings of the 
four tests with the five factors g, si, 8, 83, and s, namely: 


9 436 p 
8 i 600 : 
M= 7 3 P l4 
6 É 8 ‘ +800 


From their product NM we obtain the matrix K of — 
variances and covariances of the estimated factors, namely : 


553 239 161 109 0 436 
1-152 — -585 — 333 — 225 8 600 
— 787 1-318 — 215 — 143 y 8 5 GIs > 
e 1% %/ | | 6 ĩ 5500 
| | 


— #15 — 194 — 121 1-169 


880 241 155 115 087 
241 502 — 321 — 288 — 180 
150 — 321 788 — 154 — 116 
110 — 230 — 152 887 — 085 | 
088 — -181 — -116 — 086 -935 | 


I 
> 
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Again, we have a check on the accuracy of our arith- 
metic, for K will, if we have been accurate, be exactly 
symmetrical about its principal diagonal, i.e. its diagonal 
running from north-west to south-east. The largest dis- 
crepancy in our case is between +150 and +155. Moreover, 
since in this ease K includes all the factors, we have another 
cheek which was not available when we calculated a K for 
common factors only: the sum of the elements in the 
principal diagonal (called the “ trace,” or in German the 
“ Spur”) here must come out equal to the number of tests. 
In our case we have. 


+880 -+ 502 788 + -887 + 935 = 8-992 


and there are four tests. These elements which form the 
trace of K are, it will be remembered, the variances of the 
estimates g. 4, S» 45, and fy So that we see that the total 
variances of the five factors is no greater than the total 
variance (viz. 4) of the four tests in standard measure. 
This is only another instance of the general law that we 
cannot get more out of anything than we put into it (at 
any rate, not in the long run). 

From K we can at once calculate the correlation of the 
estimated factors. Adjusting the slight arithmetical de- 
partures from symmetry, we get: 


Bee ie om Ok 


„ |1000 „% ee 181-096 
862 1000 — 10 — 84 — 263 
184 — +510 1-000 — 188 — 135 
4% 10% — B54 — +188 1000 — 004 
4, 00 — 20 — +135 00 1:000 


from which we see that g is correlated with each of the 
estimated specifics positively, while the latter are correlated 
negatively among themselves, in this (a hierarchical) 
example. . 

We have then this result, that although we set out to 
analyse our battery of tests into independent uncorrelated 
factors, the estimates which we make of these factors are 
correlated with one another, and instead of being in 
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standard measure have variances, and therefore standard 
deviations, less than unity. We could, of course, make 
them unity by dividing all our estimates by their calculated 
standard deviation. But that would make no change in 
their correlations. 

The cause of all this is the excess of factors over tests, 
and consequently this drawback—the correlation of the 
estimates—depends upon the ratio of the number of factors 
to the number of tests. The extra factors are the common 
factors, for there is a specific to each test, and therefore 
with the same number of common factors the correlation 
between the estimates will decrease as the number of tests 
in the battery increases. Just as in the hierarchical case 
one of the tasks of the experimenter is to find tests to add 
to the number in his battery without destroying its hier- 
archical nature, so in the case of a battery which can be 
reduced to rank 2, 3, 4... or r,a task will be to add 
tests to the battery which with suitable communalities will 
leave the rank unchanged and the pre-existing com- 
munalities unaltered, in order that the common factors 
may be the more accurately estimated, and the estimates 
be more nearly uncorrelated. 

8. Barilett’s method of estimation. M. S. Bartlett (1935, 


1937a, 1938) has proposed to estimate the common factors, 


not by the ordinary regression method used above, but by 
a method which minimizes the sum of the squares of a 
man’s specific factors (already, however, maximized by 
the principle of using as few common factors as possible). 

The way in which Bartlett’s estimates differ from 
regression estimates of factors can be very clearly seen by 
thinking in terms of the geometrical picture already used 
in earlier chapters. When the factors outnumber the tests, 
the vectors representing the former are in a space of higher 
dimensions than the test space. 

The individual person is represented in the test space 
by a point, namely that point P whose projections on to 
the test vectors give his test scores. We do not know a 
representative point for this individual in the complete 
factor space, however. His representative point Q may 
be, for all we know, anywhere in the subspace which is 
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perpendicular to the test space and intersects with it at 
p. In these circumstances the regression method takes 
refuge in the assumption that this individual is average 
in all qualities of which we know nothing; that is, in 
all qualities orthogonal to our test space. It therefore 
assumes P to be his point also in the factor space, and 
projects P on to the factor axes to get the estimates of his 
factors. 

Bartlett’s method is equivalent to a different assumption 
about the position of the point Q. Within the complete 
factor space there is a subspace which contains the common 
factors. Of all the positions open to the point Q, Bartlett's 
method chooses that one which is nearest to the common- 
factor space, and from thence projects on to the common- 


factor vectors. This is equivalent to making the assump- 


tion that this man is not average in the qualities about which 
we know nothing, but instead possesses in those unknown 
qualities just those degrees of excellence which bring his 
representative point to the chosen point Q. Because men 
are most frequently near the average, the regression assump- 
tion is more likely. ; 

9. The geometrical interpretation of Bartleti's methad.— 
All this can be most clearly seen (because a perspective 
diagram can be made) in the case of estimating one genera 
factor g only, the hierarchical case. A figure like Figure 30 
will illustrate this case, if we take y and z there to be two 
tests and @ to be the g vector (see page 214). 

The man’s representative point in the yz plane is P. 
But we do not know his representative point Q in solid 
three-dimensional space, only that it is somewhere on the 
line PP“. The regression method assumes that it is 
actually at P, the average, and projects P itself on to the g 
line to get the estimate OL of g. Bartlett’s method, on the 
other hand, assumes that Q is at that point on PPP“ where 
it most nearly approaches the g line, that is, somewhere 
near the position Q in our diagram. Bartlett’s estimate of 
g is then represented by OL’. 

Now, any point on the line P' PP", when projected on to 
the test vectors y and z, gives the same two test scores. 
There is, in general, no point on the line g which does this 
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exactly. But clearly L’, of all the points on g, will be the 
point whose projections most nearly fall on Y and Z, for 
X“ is as near as possible to the line P“. That is, the 
projection of X’ on to the plane of the tests falls as near 
to the point P as is possible. In other words, if we ignore 
the specifics entirely and use only the estimated g in the 
specification of y and z, Bartlett’s estimate comes as near 
as is possible to giving us back the full scores OM and ON. 
If the regression estimate OL is projected on to the lines 
y and 2, it will obviously give a worse approximation. 

The regression method, in order to recover as much as 
possible of the original scores, would have to make a 
second estimate of them. For the estimates of g repre- 
sented by quantities like OL are not in standard measure. 
Before projecting the point L on to the lines y and z, 
therefore, to recover the original scores as far as possible, 
the regression method would alter the scale of its space 
along the g vector until the quantities like OL were in 
standard measure. This would not only change the posi- 
tion of L on the line, it would change the angles which 
the lines in the figure make with one another; and would 
change them exactly in such a manner that, in the new space, 
the projection of OL on to y and z would fall exactly where 
the Bartlett projections from L’ fall in the present space 
(Thomson, 1938a). 

There is, therefore, no final difference in excellence 
between the two methods in the matter of restoring the 
original scores as fully as possible, but the regression 
method takes two bites at the cherry. On the other hand, 
the regression estimates can be put straight into the speci- 
fication equation of an occupation which is known to 
require just these common factors, whereas here it is the 
Bartlett method which has to have a second shot. 

Both methods have to change their estimate of g when 
a new test is added to the battery. For the man is not 
very likely to have, in the specific of this new test, either 
the average value previously assumed by the regression 
method, or the special value assumed by the Bartlett 
method. But he is more likely to have the former than 
the latter, so the Bartlett estimates will change more 
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than do the regression estimates as the battery grows. 
Ultimately, when the number of tests becomes infinite, the 
two forms of estimate will agree. 

In the ease of estimates of one general factor g from a 
hierarchical battery, the Bartlett estimates differ from the 
regression estimates only in scale. They put the candidates 
in the same order of merit for g as do the regression esti- 
mates, but give them a greater scatter, making the high 
gs higher and the low g’s lower. The formula is— 


1 Vig Ži 
e 

instead of Spearman’s— 
1 Vig Ži 
os 1— 7 (see page 224). 

With more than one common factor, the connexion 
between the two kinds of estimate is not so simple (Appen- 
dix, Section 13). The mathematical reader will be able to 
calculate the Bartlett factor estimates from the matrix 
formule given in the Appendix. 

10. Estimation of oblique factors. In applying the 
method of Section 2 to oblique factors, it is important to 
note that we must use, below the matrix of correlations 01 
the tests, in a calculation like that on page 226, the matrix 
of correlations of the primary factors with the tests. 
These are the elements of the structure on the primary 
factors, F(A) D, transposed so that columns become rows 
and vice versa. It would not do to use the structure on the 
reference vectors, which is all that most experimenters 
content themselves with calculating. 

Ledermann’s short cut (Section 3 above) requires con- 
siderable modification in the case of oblique factors. See 
Thomson (1949) and the later part of Section 19 of the 
Mathematical Appendix, page 365. 


CHAPTER XVI 
REVERSING THE ROLES* 


1. Ewchanging the rôles of persons and tests. —In all the 
previous chapters the correlations considered have been 
correlations between tests, and the experiments envisaged 
were experiments in which comparatively few tests were 
administered to a large number of persons. For each test 
there would, therefore, be a long list of marks. The whole 
set of marks would make an oblong matrix, with a few 
rows for the tests, and a very large number of columns for 
the persons—we will choose that way of writing it, of the 
two possibilities. 

From such a set of marks we then calculated the 
correlation coefficients for each pair of tests, and our 
analysis of the tests into factors was based upon these. 
In the process of calculating a correlation coefficient we do 
such things to the row of marks in each test as finding its 
average, and finding its standard deviation. We quite 
naturally assume that we can legitimately carry out these 
operations. We assume, that is, that in the row of marks 
for one test these marks are comparable magnitudes which 
at any rate rise and fall with some mental quality even 
if they do not strictly speaking measure it in units, like 
feet or ounces. 

The question we are going to ask in this part of this 
book is whether, in the above procedure, the rôles of persons 
and of tests can be exchanged (Thomson, 19355, 75, 
Equation 17), and if so what light this throws upon 
factorial analysis. Instead of comparatively few tests 

* The first explicit reference to correlations between persons in 
connexion with factor technique seem to have been made inde- 
pendently and almost simultaneously by Thomson (19350, July) and 
Stephenson (1935, August), the former being pessimistic, the latter 
optimistic. But such correlations had actually been used much 
earlier by Burt and by Thomson, and almost certainly by others. 
See Burt and Davies, Journ. Haper. Pedag., 1912, 1, 251. 
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(perhaps two or three dozen) and a very large number of 
persons, suppose we have comparatively few persons, and 
a large number of tests, and find the correlations between 
the persons. In that case our matrix of marks would be 
oblong in the other direction, with a large number of 


rows for the tests, and a small number of columns for 
the persons, and each correlation, instead of being as 
before between two rows, would be between two columns. 


Taking only small numbers for purposes of an explanatory 
table, we would have in the ordinary kind of correlations 
a table of marks like this : 


Persons 
x x X x x 
Tests X x x x 5 
* * x * x 


while for correlations between persons we would have a 
table of marks like this : 


Persons 
x x x 
x x x 
x * x 
Tests X X x 
x * x 
* x x 
x * * 


But we meet at once with a serious difficulty as soon as 
we attempt to calculate a correlation coefficient between 
two persons from the second kind of matrix. To do so, 
we must find the average of each column, just as previously 
we found the average of each row for the other kind of 
correlation. But to find the average of each column (by 
adding all the marks in that column together and dividing 
by their number) is to assume that these marks are in 
some sense commensurable up and down the column, 
although each entry is a mark for a different test, on a 
scoring system which is wholly arbitrary in each test 
(Thomson, 19350, 75-6). 
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To make this difficulty more obvious, let us suppose 
that the first four tests are : 

1. A form-board test; 

2. A dotting test ; 

8. An absurdities test ; 

4. An analogies test. 

In each of these the experimenter has devised some 
kind of scoring system. Perhaps in the form-board test 
he gives a maximum of 20 points, and in the dotting test 
the score may be the number of dots made in half a minute. 
But to find the average of such different things as this is 
palpably absurd, and the whole operation can be entirely 
altered by an arbitrary change like taking the number of 
seconds to solve the form board instead of giving points. 

2. Ranking pictures, essays, or moods.—This is a very 
fundamental difficulty which will probably make correla- 
tions between persons in the general case impossible to 
calculate. In certain situations, however, it does not arise, 
namely where each person can put the “ tests ” in an 
order of preference according to some criterion or judg- 
ment (Stephenson, 1935), and it is with cases of this kind 
that we shall deal in the first place. Usually the “ tests ” 
here are not really different tests like those named above; 
but are perhaps a number of children’s essays which have 
to be placed in order of merit, or a number of pictures in 
order of æsthetic preference, or a number of moods which 
the subject has to number, indicating the frequency of 
their occurrence in himself. Indeed, the subject might not 
only give an order of preference to, say, the essays, but 
might give them actual marks, and there would be no 
absurdity in averaging the column of such marks, or in 
correlating two such columns, made by different persons. 

Such a correlation coefficient would show the degree of 
resemblance between the two lists of marks given to the 
children, or given to a set of pictures according to their 
æsthetic value. It would indicate, therefore, a resemblance 
between the minds of the two persons who marked the 
essays or judged the pictures. A matrix of correlations 
between several such persons might look exactly like the 
matrices of correlations between tests, and could be 
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analysed in any of the same ways. What would the 
“ factors ° which resulted from such an analysis mean when 
the correlations were between persons? Take an imagin- 
ary hierarchical case first. 

8. The two sets of equations.—In test analysis the common 
factor found was taken to be something called into play 
by each test, the different tests being differently loaded 
with it. The test was represented by an equation such 
as— 

Z4 = ‘6g + 88. 


For each of the numerous persons who formed the sub- 
jects of the testing, an estimate was made of his g, and 
another estimate could be made of his s. The different 
tests were combined into a weighted battery for this 
purpose of estimating a man’s amount of g. His score in 
Test 4 would then be made up of his g and s, inserted in 
the above specification equation. 


Bag 699 + 8849 


would be the score of the ninth person in Test 4. 

By analogy, when we analyse a matrix consisting of 
correlations between persons, we arrive at a set of equations 
describing the persons in terms of common and specific 
factors. Corresponding to a hierarchical battery of tests, 
we could conceivably have a hierarchical team of persons, 
from which we would exclude any person too similar to 
one already included. Each person in the hierarchical 
team would then be made up of a factor he shared with 
everyone else in the team, and a specific factor which was 
his own idiosynerasy. An equation like— 


20 = 4g’ + 9178 


would now specify the composition of the ninth person. 
g’ is something all the persons have, s,’ is peculiar to 
Person 9. The loadings now describe the person, and the 
amount of g’ “ possessed ” or demanded by each test can 
be estimated by exactly the same techniques employed in 
Chapter XV. The score which Test 4 would elicit from 
Person 9 would be obtained by inserting the g“ and sg! 
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“ possessed ” by that test into the specification equation 
of Person 9, giving— $ 
zga 484 + 91785.4 
This equation is to be compared with the former equation— 
Zag 699 + 884ĩ9 

Both equations ultimately describe the same score, but 
Zy-4 is not identical with 24.9. The raw score X is the same, 
but the one standardized z is measured from a different 
zero, and in different units, from the other. Disregarding 
this for the moment, we see that with the exchange of 
rdles of tests and persons, the loadings and the factors have 
also changed réles. Formerly, persons possessed different 
amounts of g, and tests were differently loaded with it. 
Now, tests possess different amounts of g’, and persons are 
differently loaded with it. We feel impelled to inquire 
further into the relationships of these complementary 
factors and loadings. 

The test which is most highly saturated with g is that 
one which, in terms of Spearman’s imagery, requires most 
expenditure of general mental energy, and is least depen- 
dent upon specific neural engines. It correlates more 
with its fellow-members of the hierarchical battery than 
any other test among them does. It represents best what 
is common to them all. 

The man, in a hierarchical team of men, who is most 
highly saturated with “ is that man who is most like all 
the others. His correlations with them are higher than is 
the case for any other man in the team. He is the indi- 
vidual who best represents the type. But a nearer ap- 
proach to the type can be made by a weighted team of men, 
just as formerly we weighted a battery of tests to estimate 
their common factor. 

4. Weighting examiners like a Spearman battery.—Corre- 
lations of this kind between persons were used long before 
any idea of what Stephenson has called“ inverted factorial 
analysis was present. The author and a colleague found 
in the winter of 1924-5 a number of correlations between 
experienced teachers who marked the essays written by 
fifty schoolboys upon „Ships“ (Thomson and Bailes, 
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1926). One table or matrix of such correlations between 
the class teacher and six experienced head masters who 
marked the essays independently of one another, was as 
follows : 


Te A B Q D E F 
Te > 60 690 36 69 63 67 
An 60 i +53 50 54 55 68 
B | 69 53 é 60 65 66 -64 
C 36 50 60 5 67 67 65 
D 69 5h 65 -67 À 54 69 
4 63 -55 66 67 "54 5 69 
F | 67 68 61 65 09 69 


In the article in question, these different markers were 
compared by correlating each with the pool of all the rest. 
These correlations are shown in the first row of the table 
below. 

Purely as an illustrative example, let us make also an 
approximate analysis of this matrix, and take out at any 
rate its chief common factor. On the assumption that it 
is roughly hierarchical, we can use Spearman’s formula— 


— 49 * 
Saturation 1 es} 
T — 2A 


More easily we can insert its largest correlation coefficient 


as an approximate communality for each test, and find 
Thurstone’s approximate first-factor loadings (see Chapter, 
V, page 70). We get for the saturations or loadings the 
second and third rows of this table : 
rnb FF 


Correlation with pool of rest | 7 67 70 73 „7 75 82 
Spearman saturations 814 704 -796 -766 -798 788 -861 
Thurstone method | -81 -73 -80 -78 -80 -80 -85 

We see that F is the most “ typical ” examiner of these 
essays, in the sense that he is more highly saturated with 
what is common to all of them; while 4 conforms least 
to the herd. 

With the same formula which on page 224 we used to esti- 
mate a man’s g from his test scores, we could here estimate 


See Chapter III, page 43. 
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an essay’s g from its examiner scores. That is to say, the 

marks given by the different examiners would be weighted 

in proportion to the quantities— . 
Saturation with g“ 


1 — saturation? 
where g is that quality of an essay which makes a common: 
appeal to all these examiners. Their marks (after being 
standardized) would therefore be weighted in the propor- 
tions -814/(1 — 81450, ete., that is: 


2 B Cc D E F 
241 140 217 185 220 208 8-33 
or 72 42 65 56 66 63 1:00 


to make global marks for the essays, which could then be 
reduced to any convenient scale. If this were done, the 
result would be the “ best“ estimate“ of that aspect or 
set of aspects of the essay which all these examiners are 
taking into account, disregarding all that can possibly 
be regarded as idiosyncrasies of individual examiners. 
Whether we think it the best estimate in other senses is a 
matter of subjective opinion. We may wish the “ idiosyn- 
crasies ” (the specific, that is) of a certain examiner to be 
given great weight. It clearly would not do, for example, 
to exclude Examiner A from the above team merely because 
she is the most different from the common opinion of the 
team, without some further knowledge of the men and the 
purpose of the examination. The “ different“ member in 
a team might, for example, be the only artist on a com- 
mittee judging pictures, or the only Democrat in a court 
judging legal issues, or the only woman on a jury trying 
an accused girl. But in non-controversial matters, if all 
are of about equal experience, it is probable that this 
system of weighting, restricting itself to what is certainly 
common to all, will be most generally acceptable as 
fairest. 

* Best whether we adopt the regression principle or Bartlett's. 
For if only one “ common factor ” is estimated, the difference is 
one of unit only, and the weighting in the text is the ‘ best ” on 
both systems. 
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5. Example from “The Marks of Examiners.’ This 
form of weighting examiners’ marks has probably never 
yet been used in practice. But it has been employed, by 
Cyril Burt, in an inquiry into the marks given by examiners 
(Burt, 1936). As an example, we take the marks given 
independently by six examiners to the answer papers of 
fifteen candidates aged about 16, in an examination in 
Latin. (The example is somewhat unusual, inasmuch as 
these candidates were a specially selected lot who had all 
been adjudged equal by a previous examiner, but it will 
serve as an illustration if the reader will disregard that 
fact.) The marks were (op. cit., 20) : 


Cand.| A B C D E F Examiners 


1 39 43 52 37 43 40 
2 39 44 50 43 43 46 
3 + 51 55 AT 16 46 
4 37 46 43 44 40 43 
5 38 47 55 35 AS 45 
6 45 50 54 45 45 49 
7 42 52 51 45 1 1 46 
8 43 49 53 AT 46 46 
9 32 42 49 34 36 38 
10 37 40 48 37 39 2 
11 38 42 AT 39 36 39 
12 40 44 50 41 36 42 
13 38 43 50 36 34 41 
14 35 45 49 37 40 40 
15 32 38 4l 28 34 34 


The correlations between the examiners calculated from 
this table are (the examiner with the highest total correla- 
tion leading) : 


EE A B E D C 
rF z 86 84 82 8⁴ 71 
A | 886 1 80 T4 85 71 
B 84 “80 $ 80 81 67 
E | -82 74 80 8 72 69 
D| 84 85 81 72 k 48 
Gly hi 7A 67 69 48 š 


1 55 assuming this table to be hierarchical, we find each 
examiner’s saturation with the common factor by Spear- 


) 
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man’s formula, we obtain (with Professor Burt, op. cit., 
294) : 

F A B E D C 

95 92 91 87 84. 72 


In the sense, therefore, of being most typical, F is here 
the best examiner. The proportionate weights to be given 
to each examiner, in making up that global mark for the 
candidate which will best agree with the common factor of 
the team of examiners, are, as before— 


Saturation 
1 — saturation? 


provided the marks have first been standardized. The 
resulting weights, giving F the weight unity, are: 


F A B E D C 
1:00 61 54 37 29 15 


(If the weights are to be applied to the raw or unstan- 
dardized marks, they must each be divided by that 
examiner’s standard deviation.) 

The marks thus obtained are only an estimate of the 
true common-factor mark for each child, just as was 
the case in estimating Spearman’s g; and the correlation 
of these estimates with the “ true (but otherwise undis- 
coverable) mark will be, as there (Chapter XV, page 224)— 


= | 
TNES 


where & is the sum of all the six quantities— 


Saturation? 
1 — saturation? 
In our case this gives— 
Tn = ‘98 

The best examiner’s marking itself correlated with the 
hypothetical “ true ” mark to the amount -95, so that 
the improvement is not worth the trouble of weighting, 
especially as the simple average of the team of examiners 
gives -97. But in some circumstances the additional 


F. A.—9 
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labour might be worth while, and there is an interest in 
knowing which examiners conform least and which most 
to the team, and having a measure of this. 

After the saturation of each examiner with the hypothet- 
ical common factor has been found, the correlations due 
to that factor can be removed from the table exactly as 
in analysing tests. The residues, as there, may show 
the presence of other factors; and “specific” resem- 
blances or antagonisms between pairs of examiners, or 
minor factors running through groups of examiners, may 
be detected and estimated. 

In short, all the methods used on correlations between 
tests may be employed on correlations between examiners. 
The tests have come alive and are called examiners, that 
is all. But since the child’s performance, judged by 
the different examiners differently, is here nevertheless 
the same identical performance, our interpretation of the 
results is different. The two cases throw light on one 
another. X Spearman hierarchical battery of tests may 
estimate each child’s general intelligence, which is there 
something in common among the tests. The examiners 
may have been instructed to mark exclusively for what 
they think is general intelligence. In that case their 
weighted team will estimate for each child a general 
intelligence, which is something in common among the 
somewhat discrepant ideas the examiners hold on this 
matter. 

6. Preferences for school subjects. In the previous sec- 
tions we have discussed correlations between examiners 
who all mark the same examination papers. The purpose 
of their marking these papers is to award prizes, distinc- 
tions, passes, and failures to the candidates. The exam- 
iners are a means to this end; the reason for employing 
several of them is to obtain a list of successes and failures 
in which we can have greater confidence. The technique 
described is one which enables us to combine their marks, 
on certain assumptions, to greatest advantage. But it 
can, as in the inquiries described in The Marks of Examiners, 
be turned to compare individual examiners, and to evaluate 
the whole process of examining. 


— 
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It is only a step to another, very similar, experiment in 
which objects evaluated by the “ examiners” are not the 
works of candidates in an examination, but are objects 
chosen for the express purpose of gaining an insight into 
the minds of those asked to judge them. Thus we might 
ask several persons each to evaluate on some scale the 
sesthetic appeal of forty or fifty works of art (Stephenson, 
1936b, 353), or ask a number of school pupils each to place 
in order of interest a list of school subjects. 

Stephenson (1936a) asked forty boys and forty girls 
attending a higher school in Surrey, England, thus to 
place in order of their preference twelve school subjects 
represented by sixty examination papers, and calculated 
for about half these pupils the correlation coefficients 
between them. To explain the kind of outcome that may 
be expected from such an experiment it will be sufficient 
for us to quote his data for a smaller number of pupils, 
say eight girls, avoiding anomalous cases for simplicity in 
a first consideration. The correlations between them were 
as follows (op. cit., 50) : 


Girl 3 4 5 75 17 18 19 20 
3 . 59 “31 26 —-02 —16 —-88 —-35 
4 59 . 75 42 —23 —01 —:66 —-03 
5 31 75 . 65 —'29 —02 —-18 — 08 
7 26 —42 65 . —.50 —15 —-54 —'I7 
17 | —02 —.23 —-29 —-50 . 60 52 72 
18 — 16 —.01 — 502 — 15 60 . 09 79 
19 —88 — 66 —-18 — 54 52 09 . 40 
20 — 35 — 03 — 08 — 17 72 79 40 


This table at once suggests that these girls fall into two 
types. Girls 3, 4, 5, and 7 correlate positively among 
themselves; they have somewhat similar preferences 
among school subjects. Girls 17, 18, 19, and 20 correlate 
positively among themselves. But the two groups correlate 
negatively with one another. The two types were different 
in their order of preference, Type I tending, for example, 
to put English and French higher, and Physics and 
Chemistry lower, than Type II (though both were agreed 
that Latin was about the least lovable of their studies!). 
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7. A parallel with a previous experiment.—This experi- 
ment, it will be seen, forms a parallel to that inquiry (also 
by Stephenson) described in Chapter I, Section 9, where 
tests fell into two types, verbal and pictorial, with correla- 
tions falling there as here into four quadrants. If we call 
the two types of school pupil here the linguistic (L) and 
the scientific (S), and again use C for the cross-correlations, 
the diagram corresponding to that on page 16 of Chapter I 
is : 


The chief difference between the two cases is that there 
the cross-correlations, though smaller than hierarchical 
order in the whole table would demand, were nevertheless 
positive. Here, however, the cross-correlations are 
actually negative. 

It is true that the signs of all the correlations in the C 
quadrants can in either case be reversed, by reversing the 
order of the lists either of all the earlier or all the later 
variables (there tests, here pupils). But that is not really 
permissible in either case. We have no doubt which is 
the top and which the bottom end of a list of marks, 
whether in a verbal test or a pictorial test; and to reverse 
the order of preference given by either the linguistic or the 
scientific pupils would be simply to stultify the inquiry. 
There is, therefore, a real difference between the cases. 
In the present set of correlations something is acting as an 
“ interference factor.” 

In Chapter I we explained the correlations and their 
tetrad-differences by the hypothesis of three uncorrelated 
factors g, v, and p required in various proportions by the 
tests, and possessed in various amounts by the children. 
The loadings which indicated the proportions of the factors 
in each test we tacitly assumed to be all positive. Thur- 
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stone expressly says that it is contrary to psychological 
expectation to have more than occasional negative loadings. 

8. Negative loadings. Let us endeavour to make at least 
a qualitative scheme of factors to express the correlations 

between the pupils, factors possessed in various amounts 
by the subjects of the school curriculum, and demanded 
in various proportions by each pupil before he will call 
the subject interesting. One type of pupil weights heavily 
the linguistic factor in a subject in evaluating its interest 
to him. The other type weights heavily the scientific 
factor in a subject in judging its attraction for him. But 
to explain actual negative correlations between pupils we 
must assume that some of the loadings are negative, 
assume, that is, that some of the children are actively 
repelled by factors which attract others. Common sense 
does not think thus. Common sense says that two children 
may put the subjects in opposite orders, even though they 
both like them all, provided they don’t like them equally 
well, But then common sense is not anxious to analyse 
the children into uncorrelated additive factors. If each 
child is thus expressed as the weighted sum of various 
factors, two children can correlate negatively only if some 
of the loadings are negative in the one child and positive 
in the other, for the correlation is the inner product of the 
loadings. Since Stephenson has found numerous nega- 
tive correlations between persons, and since few negative 
correlations are reported between tests, we seem here to 
have an experimental difference between the two kinds of 

correlation, and if ever correlations between persons come 
to be analysed as minutely and painstakingly as correla- 
tions between tests, it would seem that the free admission 
of negative loadings would be necessary.“ The present 
matrix can in fact be roughly analysed into two general 
factors, one of which has positive loadings in all pupils, 
while the other is positively loaded in the one type, 
negatively loaded in the other. 

9. An analysis of moods.—A still more ingenious appli- 
cation by Stephenson of correlations between persons is in 
an experiment in which for each person a “ population ” 

* See Stephenson, 19360, 349. 
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of thirty moods, such as“ irascible,” “ cheerful,” “ sunny,” 
were rated for their prevalence and intensity for each of 
ten patients in a mental hospital, and for six normal 
persons (Stephenson, 1986c, 363). This time the correla- 
tion table indicated three types, corresponding to the 
manic-depressives, the schizophrenes, and the normal 
persons, each type correlating positively within itself, but 
negatively or very little with the other types. These 
experiments were only illustrative, and it remains to be 
seen whether factors which will prove acceptable psycho- 
logically will be isolated in persons in the same manner as g, 
and the verbal factor, have been isolated in tests. The 
parallel between the two kinds of correlation and analysis 
is, however, certainly likely to throw light on the nature of 
factors of both kinds. 


CHAPTER XVII 


THE RELATION BETWEEN TEST FACTORS 
AND PERSON FACTORS 


1. Burt's example, centred both by rows and by columns,—In 
the examples we have just considered, there is no doubt 
that correlations between persons can be calculated without 
absurdity. In the matrix of marks given by a number of ex- 
aminers (marking the same paper) toa number of candidates, 
either two candidates can be correlated or two examiners. 
The heterogeneity of marks referred to in Chapter XVI. 


most determined attempt to find an exact relationship has 
been that made by Cyril Burt, who concludes that, if the 
initial units have been suitably chosen, the factors of the 
one kind of analysis are identical with the loadings of the 
other, and vice versa (Burt, 10876). The present wri 


by using Burt's own small numerical example, based on a 
matrix of marks for four persons in three tests : 
Persons a b e d 
a TA OA 
Testa 2 8 l —1 —8 
3 83 3 1-1 
It will be noticed that this matrix of marks is already 
centred both ways. The rows add up to zero, and so do 
263 
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the columns. The test scores have been measured from 
their means, and then thereafter the columns of personal 
scores have been measured from their means; or it can 
be done persons first, tests second, the end-result being 
the same. Burt does not give the matrix of raw scores 
from which the above matrix comes. 

If we take the doubly centred matrix as he gives it, the 
matrices of variances and covariances formed from it are: 


Test Covariances 


1 2 8 


1 
2 |— 28 20 8 
8 


3 0 36 
WW 
c 0 —4 2 2 
9 3 8 2 26 


Notice that in both these matrices the columns add to 
zero, just as they do in the matrices of residues in the 
“ centroid ” process. 

2. Analysis of the covariances.—Burt next proceeds to 
analyse each of these by Hotelling’s method. It seems 
clear that there will exist some relation between the two 
analyses, since the primary origin of each matrix is the 
same table of raw marks, and to show that relation most 
clearly Burt analyses the covariances direct, and not the 
correlations which could be made from each table (by 
dividing each covariance by the square root of the product 
of the two variances concerned). For the two Hotelling 
analyses he obtains (and the centroid factors before 
rotation would here be the same) : 
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Analysis of the Tests 


2V 14y, 


* 


® = —V14yy + Vo 
z, VII — VI. 


Analysis of the Persons 


a=—3vV6f, 

b= V + 2V2f, 
Ca — V2 fry 
d= 2V6f,— V2. 


In both eases two factors are sufficient (there will always 
be fewer Hotelling or centroid factors than tests with 
a doubly centred matrix of marks, for a mathematical 
reason). The reader can check that the inner products 


give the covariances, e.g.— 


covariance (bd) = V X 2% — 24/2 X V =12—4 =E 

The method of finding Hotelling loadings was described 
in Chapter VII, and the reader can readily check that the 
coefficients of y,, for example, do act as required by that 
method. For if we use numbers proportional to 2/14, 
— „.d, and — /14, namely 1, — 4, — 4, as Hotelling 


multipliers we get : 


56 —28 — 28 
— 28 20 8 
— 28 8 20 
56 —28 — 28 
14 —10 —4 
14 —4 —10 
84 —42 — 42 
proportional to 1 —4 


1 
=R 
=f 


— as required. 


The largest total (84) is the first “ latent root,” and the 


multipliers 1, 


— 4, — 1, have to be divided, according to 


Chapter VII, by the square root of the sum of their squares, 
and multiplied by the square root of 84, giving— 


24/14 —y14 


F.A. —9* 


—y/14 
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8. Factors possessed by each person and by each test. — 
Burt then goes on to “estimate,” by “ regression equa- 
tions,” the amount of the factors y possessed by the 
persons, and the amount of the factors f possessed by the 
tests. There is a misuse of terms here, for with these 


factors there is no need to “estimate ” ; they can be 
accurately calculated : but that is a small point. The first 
three equations can be solved for the y’s—there is indeed 
one equation too many, but it is consistent. And the four 
equations of the second group can be solved for the f’s— 


again they are consistent. Since the equations are con- 
sistent, we can choose the easiest pair in each case to solve 
for the two unknowns. Choosing the two equations for 
x, and a we obtain 


1 
a 2Vi4 
— + i 


Ye V6 


For the other set of factors we naturally choose the 
equations in a and c, and have 


a 
f= y 

c 
MES 


Now, since we are very liable to confusion in this dis- 
cussion, let us remind ourselves what these factors y and 
these factors f are. The factors y are factors into which 
each test has been analysed. They do not vary in amount 
from test to test, but each test is differently loaded with 
them. They vary in amount from person to person. 

The factors f are factors into which each person has been 
analysed. These do not vary in amount from person to 
person, but from test to test. Each person is differently 
loaded with them, that is, made up of them in different 
proportions. The y’s are uncorrelated fictitious tests : the 
f’s are uncorrelated fictitious persons. 


€ 
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Now, from the equations— 


NA 
1 2A 
22 + 2% 
2 /6 


we can find the amount of each factor y; and y possessed 
by each person, by inserting his scores 2 and æ, in these 
equations, scores which are given in the matrix : 


a b ë d 


16 2 0 4 
2 3 1 —1 — 8 
3 3 — 3 N 


Thus the first person possesses y, in an amount 
— 6/24/14, because his æ, is — 6. For the four persons 
and the two factors we find the amounts of these factors 
possessed by each person to be: 


Factors Yı Yo 
Z 0 
5 a 
b 1 2 
' VAI y6 
c 0 l 
4/6 
2 
1 2 i 
VIA y6 


4. Reciprocity of loadings and factors. — These are the 
amounts of the factors y possessed by the four persons. If 
now the reader will compare them with the loadings of 
the factors f in the second set of equations on page 265, 
he will see a resemblance. The signs are the same, and 
the zeros are in the same places. Moreover, the resemblance 
becomes identity if we destandardize the factors fı and 5 
measuring the former in units +/84 times as large, and the 
latter in units 1/12 times as large, 84 and 12 being the 
non-zero latent roots of both matrices. In these units let us 


268 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 


use pi and @, for them. The equations on page 265 giving 
the analysis of the persons then become 


a = ZEA (vai) =- Sun 

b= VE ai) + SGV) aat am 

aS - ME (%% = - To” 
EVE (vai -EVD am — yar 


It will be seen that the loadings of e and ¢, are identical 
with the amounts of y, and y in the table on page 267. 
A similar calculation could be made comparing the amounts 
of fı and f, possessed by the tests with the loadings of yı 
and y (suitably destandardized) in the analysis of the 
tests. As we said at the outset, if suitable units are chosen 
for the marks and the factors, the loadings of the personal 
equations are the factors of the test equations, and the 
factors of the personal equations are the loadings of the 
test equations. But only for doubly centred matrices of 
marks. It would be wrong to conclude in general that 
loadings and factors are reciprocal in persons and tests. 

Indeed, even for doubly centred matrices of marks, this 
simple reciprocity holds only for the analysis of the 
covariances and not for analyses of the matrices of corre- 
lations. Except by pure accident (and as it happens, 
Burt’s example is in the case of test correlations such an 
accident), the saturations of the correlation analysis will not 
be any simple function of the loadings of the covariance 
analysis. 

5. Special features of a doubly centred matriz.—But in 
any case, a matrix of marks which has been centred both 
ways is one in which only a very special kind of residual 
association between the variables is present. Most of what 
we commonly call the association or resemblance between 
either tests or persons, the amount of which we gauge by 
the correlation coefficient, is due to something over and 
above this. We can write down an infinity of possibly raw 


—— E N 
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matrices from which Burt’s doubly centred matrix might 
have come. To the rows of the latter matrix we can add 
any quantities we like without in the slightest altering the 
correlations between the tests, but making enormous 
changes in the correlations between the persons. Let us, 
for example, add 10 to the top row, 18 to the middle row, 
and 16 to the bottom row. There results the matrix: 


a b c d 


1 „ OIA 
2 % „% See 0m) 
U 


This gives as correlations between the persons: 


Vara b 0 d 
a 1-00 75 84 — 14 
b | ‘75 1:00 28 — 76 
c 84 28 1-00 42 
d —.14 —-76 42 1-00 


Next, without changing this matrix of correlations 
between persons in the slightest, we can add any quantities 
we like to the columns of the matrix of marks, and produce 
an infinity of different matrices of correlations between 
tests. If, for example, we add 5, 2, 8, and 9 to the four 
columns, we have a matrix of raw marks : 


a b 0 d 


| 9 14 18 23 
21 16 20 19 (B) 
2% B 
This has the same correlations between persons, but the 
correlations between tests are now : 


1 2 3 
— — — 
| 1:00 — 16 24 
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Or instead, by adding suitable numbers to the columns 
and to the rows, we might have arrived at the matrix: 


3 58 48 24 10 


or equally well at : 


(D) 


3 34 30 28 28 


The order of merit of the persons in each test is quite 
different in each of these matrices. The order of difficulty i 
of the tests for each person is quite different in each. 1179 
we consider the ordinary correlation between Tests 1 and 2, 

we find that it is negative in (B), zero in (D) and positive — 
in (C), yet all of these matrices reduce to Burt’s matrix 
when centred both ways. It is clear that they contain 
factors of correlation which are absent in the doubly 
centred matrix. e 

The averages of the rows and the columns of (C) areas 


follows : 
a b c d | Average 


il 44, 48 18 10 | 30 
40 


Average | 55 51 28 11 


The correlation between two tests is clearly influenced 
very much by the fact that here the person a is so much 
cleverer than the person d. Similarly, the correlation 
between two persons is influenced by the fact that Test 1 5 
is more difficult than Test 2. As soon as the matrix is 
centred both ways, all the correlation due to these and 
similar influences is almost extinguished. Centred by rows, 
(C) becomes : 
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14 18 —12 —20 
28 17 —18 — 27 
28 18 — 11 — 25 


and all the tests are equally difficult on the average. 
Centred by columns as well, it becomes: 


—6 2 0 4 
8 1 1 —8 
6 3 1 21 


and not only are all the tests equally difficult on the average, 
but all the persons are equally clever on the average. It 
is to the covariances still remaining that Burt’s theorem 
about the reciprocity of factors and loadings applies. It 
does not apply to the full covariances of the matrix centred 
only one way, in the manner usually meant when we speak 
of covariances or of correlations. 

6. An actual ewperiment.—In Part III of Burt’s The 
Factors of the Mind (London, 1940) his principle of reci- 
procity of tests and persons is seen in an actual illustrative 
experiment on the distribution of temperamental types. 

This experiment was on twelve women students, 
selected because the temperamental assessments made by 
various judges on them were more unanimous than in the 
case of the other students. Each, therefore, was a well- 
marked temperamental type. They were assessed for the 
eleven traits seen in the table below. The assessments 
over each trait were standardized, i.e. measured in such 
units and from such an origin that their sum was zero and 
the sum of their squares twelve, the number of persons, 
so that the group was (artificially) made equal in an 
average of sociability, sex, ete. The correlations between 
the traits were then calculated. and centroid factors taken 
out, the first two of which I shall call by the Roman letters 
u and v. These two are possessed in some amount by 
each of the persons and required, in degrees indicated by 
the saturation coefficients, by each of the traits. These 
saturation coefficients have been found by analysis of the 
correlations between the traits. 
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Now according to the reciprocity principle, if we analyse 
instead the correlations between the persons, find factors 
which we may indicate by Greek letters, and measure the 
amounts of these possessed by the eleven traits, these 
amounts ought to be the same as the saturation coefficients 
of the Roman factors u, v, ete. 

Burt therefore further standardizes the assessments, 
by persons this time, and finds the total scores on each 
trait, which are, by a property of centroid factors (see 
page 217) proportional to the amounts of a centroid Greek 
factor possessed by the eleven traits; and the test of the 
reciprocity hypothesis is to see whether these totals are 
similar to the saturations of a Roman factor. The figures 
(from Burt's page 405) are given in the table below: 


Saturations of the | Amounts of the 

Roman factors Greek factor 

u v | æ 
Sociability A . . 671 508 587 
Sex A a „ 2 878 213 489 
Assertiveness . > > 827 483 | 378 
Joy. A N x 7 ‘951 233 297 
Anger = 3 ` 824 241 280 
Curiosity . 4 ` x 780 — 268 001 
Fear 5 3 5 : 898 — -159 — -089 
Sorrow . ; . 3 259 — 104 — 887 
Tenderness k 564 — 667 — 47 
Disgust . 0 ‘l : 830 — 490 — 489 
Submissivenesss. 412 — 685 — 525 


Clearly the amounts of « do not correspond to the 
saturations of w; not should they, for a general factor 
has already been eliminated by the double standardization. 
They do, however, agree reasonably well with the satura- 
tions of the second Roman factor v, and confirm Burt’s 
prediction that, even in this sample, and with factors 
which are not exactly principal components, the reci- 
procity principle would still hold approximately, 


1 


CHAPTER XVIII 


THE INFLUENCE OF UNIVARIATE SELECTION 
ON FACTORIAL ANALYSIS* 


1. Univariate selection. All workers with intelligence 
tests know, or ought to know, that the correlations found 
between tests, or between tests and outside criteria, depend 
to a very great extent indeed upon the homogeneity or 
heterogeneity of the sample in which the correlations were 
measured. If, to take the usual illustration, we measure 
the correlation between height and weight in a sample of 
the population which includes babies, children, and grown- 
ups, we shall obviously get a very high result. If we 
confine our measurement to young people in their teens, 
we shall usually get a smaller value for the coefficient of 
correlation. If we make the group more homogeneous 
still, taking, say, only boys, and all of the same race and 
exactly the same age, the correlation of height and weight 
will be still less Through all these changes towards 
greater homogeneity in age, the standard deviation (or its 
square, the variance) of height has also been sinking, and 
the standard deviation of weight also. The formule which 
describe these changes were given in 1902 by Professor 
Karl Pearson, f and when the selection of the persons 
forming the sample is made on the basis of one quality 
only, these formule can be put into the following very 


simple form. 
Let the standard deviations of (say) four qualities be 


* Thomson, 1937 and 19380. 

+ Greater homogeneity need not necessarily, in the mathematical 
sense, decrease correlation, and occasionally it does not do so in 
actual psychological experiments. But it almost always does so. 

$ These formule are not, as was once thought, only applicable if 
all distributions are normal (see Lawley, 1943c, where the necessary 
conditions are stated). ‘They have been found by trial to give good 
results even when the sample has been made by cutting off a tail, or 
both tails, of the distribution. 
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in the complete population—we must, of course, in each 
case define what we mean by the complete population, as 
for example all living adults who were born in Scotland— 
given by X, De, Ta, and E, and their correlations by 
Rj, Nis, ete. Now let a selection of persons be made who 
are more homogeneous in the first quality—say, in an 
intelligence test which has been given to them all—so that 
its standard deviation in the sample is only cs, and write— 


The smaller p, is, the more homogeneous the group is in 
intelligence-test score. If we write— 


4 = V(1 — pi) 
qı will be larger, the greater the shrinkage in intelligence 
score-scatter from Ti to c. We shall call q, the “ shrink- 
age ” of the quality No. 1 in the sample. 
The other qualities 2, 3, and 4, being correlated with the 
first, will tend to shrink with it, and their expected shrink- 
ages qo, qa, and q, can be calculated from the formula 


9. = HR, 


For the sort of reason indicated earlier in this paragraph, 
the correlations of the four qualities—which we are for 
simplicity in exposition assuming to be positively correlated 
in the whole population—will also alter, according to the 
formula— 

Ry — gcc 

PP; 

2. Elementary proof—This formula can be readily 
proved, for the case where the average is unchanged, by 
using our geometrical model of correlation, in which tests 
or other variables are represented by lines all crossing each 
other at the “average man,” and at angles with one 
another whose cosines equal the correlation coefficients 
between the tests (see Chapter VI). 

In this perspective figure let OA, OB, and OC be three 
lines in three-fold space representing three tests. The 
triangle ABC is in a plane at right angles to OA. Write 


Tij = 


THE INFLUENCE OF UNIVARIATE SELECTION 277 


cos æ = cos BOA = Ry 

cos g = cos COA = Rys 

cos y = cos BOC = Ry 

Take the distance OA as unity. Each test is standard- 

ized, so that its standard deviation is unity. Now let the 
standard deviation of Test 1 be reduced so that it becomes 
pı = OD. This means, in our geometrical model, that the 
whole three-fold space in 
which our lines OA, OB, 
and OC exist is compressed 
from A towards O, and 
every line parallel to this is 
shortened in the same way. 
The point B moves up to 
E, and the point C to F. 
The whole triangle ABC is 
lifted up, remaining at 
right angles to the line OA, 
to a new position DEF. 
The test lines OB and OC 
become OE and OF. The 
angle y = BOC has become 
the angle y' = EOF, and 
cos y’ represents the new i Figure 31. 
correlation coefficient be- 
tween Tests 2 and 3. Our object is to find cos . in terms 
of the known quantities a, f, y, and p. One method is to 
express BC? in terms of the triangle BOC, and EF” in terms 
of the triangle EOF, and equate them, since BC = EF. 


First note that 
OB: — OF: = 0A? — OD? = 1 f =H" 


and similarly OC: — OF? = q? 
Also pa = OE/OB, and p, = OH 
OB? — OF? 
Further, q =1— pf = 5 0%) B. 
and similarly q? = 91/001 
Now, since 


BC: = OB + OC? — 20B.0C cos y 
and EF? = OE? + OF? — 20E.0F cos y' 
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we have, subtracting, 


0=(0B?— 8 OF*)—20B.0C cos y+20E.0F cos y’ 
= q? q? —20B.0C cos y+20E.0F cos y’ 


whence 
OB. OC cos y — q’ 


ey = OR.OF 


ae COSY. = Geils 

P2Ps 
Rog — 9293 

PaPa 

3. A numerical erample. Let us define our “ whole 
population as all the eleven-year-old children in Mas- 
sachusetts, and let us suppose (the numbers are entirely 
fictitious) that the standard deviations of all their scores 
in four tests are : 


or 723 


1. Stanford-Binet test 16:5 =), 
2. The X reading test 24-9 = Ep 
3. The Y arithmetic test 27-3 = X,, 
4. The Z drawing scale 14:2 = È, 


while the correlations between these four, in a State-wide 
survey, are (these are the R correlations) : 


| i 2 3 4 
1 ; 69 75 32 
2 69 x -54 18 
3 “75 BA : -06 
4 -82 18 06 


Now let a sample of Massachusetts eleven- year- olds be 
taken who are less widely scattered in intelligence, with 
a standard deviation in their Stanford-Binet scores of 
only 10-2. How will all the other quantities listed above 
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tend to alter in this sample? We have, using the formule 
quoted, the following— 
e 618 
P= es 
qı = V(1 — 6180 = -786 

and from q; = Ru we have the other shrinkages q, and 
thence the coefficients p and the new standard deviations 
o= puis 

1 2 3 4 


q 786 542 590 252 
p 618 840 808 968 
0 10.2 209 221 137 


The formula for r; then enables us at once to calculate 
the correlations to be expected in the sample, namely : 


204 »054 — 113 > 


13 -509 -574 204 
e 325 054 
8 574 325 „ 
4 | 


The greater homogeneity in the sample has made all the 
correlation coefficients smaller, and has indeed made 734 
become negative. 

The reader should note that these standard deviations 
and correlations are what result from selecting on the Stan- 
ford-Binet test, letting the other changes happen in con- 
sequence. It would be quite a different matter to select on 
the X reading test. Even if we did so, so as to reduce the 
reading test standard deviation from 24-9 to 20-9 as 
happened above, the other changes would be quite differ- 
ent. The Stanford-Binet standard deviation would, for 
example, not be reduced to 10-2 but only to 15:8. And 745 
would not be -574, but 722. The difference, in terms of 
our Figure 31, is that whereas selecting the Stanford-Binet 
corresponded to shortening the line OA and with it all 
parallel distances in the space, selecting the reading test 


corresponds to shortening OB and all distances parali to 
it: quite a different distortion of the space. t 
4. From sample to population In the above numeri 


tion of Massachusetts eleven-year-old children, and ask 
what they would become in a sample with a smaller scatter 
in the Stanford-Binet score. The problem might, however, 
be reversed, in which case, with a little care, the sami 
formula can be used. : 

Let us suppose that we know from experiment the above 
facts about the sample—the standard deviations 10 
20-9, 22-1, 13-7, and all the correlation coefficients in t 
table 509, -574, ete.—and that we know further that 
standard deviation of Stanford-Binet scores in the whi 
population in question is 16-5. The sample we ha 
worked with is obviously a biased one, restricted in rang 
of Stanford-Binet scores, and we wish to estimate what oul 
correlation coefficients would have been if we had test 
all Massachusetts eleven-year-olds, or, at least, an un 
biased sample. We want, indeed, to work the aboy 
example backwards. 

The quantity pi is, in this direction, greater than uni 
namely— 

16-5/10-2 = 1-618 

and 91 =1 — p,* = — 1-617 


The quantity q, is therefore the re root of a minus 
quantity, which we express as à 
= /(1:617)i = 1-2727, where i = V — 1 


The other q’s can be got from q, by the same formula as 
before, namely q; = EH where R now means a correlation 
coefficient in the sample. Thus— 


Ge = GR = 1.2727 x +509 = -647i 
9 = QBs = 12721 X -574 = 730i 
Then— 
Po? =1—qo?=1+ +647? (for 12 —1) 1-419; p,=1-191 
and similarly p, = 1-238. 
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We then have— 
Rog — 42s 325 — 6471 X -730i 


728 = — = 
a PoPs 1191 x 1-238 
325 + -472 
Taye „3 


as in the table for the population. In this way that table 
can be completely reconstituted. It is then, of course, 
only an estimate and, moreover, an estimate based on the 
assumption that our sample differs from the population 
only by reason of one of the four variables—namely, the 
Stanford-Binet score—being restricted, deliberately or 
accidentally, the other restrictions being supposed to have 
followed sympathetically by reason of the correlations. 
In few practical examples can we be sure of the mode of 
selection. 

5. Variance of differences between scores. Our numerical 
example enables us to illustrate a very useful fact, that the 
variance of the differences between the scores in two tests 
is independent of the amount of selection if both tests have 
been equally shrunk, and is reasonably constant when this 
condition is not too much departed from. 

For example, o? for the differences between the scores in 
Tests 2 and 3 would be, by the formula— 


os = 03? + 0 27200203 
equal in the population to— 
24-92 + 27.3 — 2 X 249 X 27:3 X 54 = 681-15 


and in the sample to— 

20:92 ＋ 22-1? — 2 X 20:9 X 22-1 X 825 = 625:0 
that is, almost the same, although p, does not quite equal 
ps: This fact gives another method of estimating a popu- 
lation correlation if the sample correlation between 
differences can be calculated, and if the standard devia- 
tions in the population are known or can be guessed. For 
example, suppose a worker with the sample calculated 
from his data the value— 

028 = 625 
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and had reason to think that in the population, or in some 
other sample, the standard deviations were 25 and 27 (as 
they nearly are in our example), he could estimate the 
unknown correlation as— 
2 * 25 X 27 — 625 
2 X 25 x 27 

Actually it was -54. But this method would fail badly if 
the quantities p; and p; were markedly different (Emmett, 
1951, B.J.P.Statist., 4, (1)). 

6. Selection and partial correlation.—If a sample is made 
completely homogeneous in the Stanford-Binet test, clearly 
Pı =0 and q, = 1. The same formule then give us: 


== *587 


a! 2 3 4 
q 1 69 -75 -82 
p 0 524 438 904 


S O 18:0 119 128 


and the resulting correlation coefficients, which in this case 
are called “ coefficients of partial correlation for constant 
Stanford-Binet score,” are, by the same formula : 


| 1 2 3 4 
1 ; . 
2 . 098 — 086 
8 ` 098 . — 455 
4 — 086 — 455 


The correlations of the Stanford-Binet test with the 
others are given by the formula as 0/0, that is, indeter- 
minate. That they are really zero is seen from the fact 
that when p, is taken as not quite zero, but very small, 
these correlations come out by the formula as very small. 
They vanish with p,. 

In this special case of “ partial correlation,” where the 
directly selected test is so stringently selected that everyone 
in the sample has exactly the same score in it, our formula— 


Ry — 4% 
PPj 


Tij = 
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has a more familiar form. For since— 


G = “Ri 

and 911 

in this case of complete shrinkage we have— 
qi = Ry 

and pi = VI — Ri’) 

so that our formula becomes— 
ry 2 Ry 2 RyRy 


M Rè) N — Ry?) 
the usual form of a partial correlation coefficient. Its 
more conventional notation is, calling the test which is 
made constant Test k instead of Test 1— 
1 Tij — Tije 
i V ra“ V- Tx) 
If the “ test” which is held constant is the factor g, 
this becomes 


Tij — 10. 0 


i Vf =i) VR Ti) 

which is called the “ specific correlation between i and j. 
Its numerator is the “ residue ” left after removing the 
correlation due to g. If g is the sole cause of correlation, 
holding g constant will destroy the correlation and we shall 
have— 


T 


Ty = Tal ia 
as we already saw from another point of view was the case 
in a hierarchical battery, in Section 4 of Chapter I. 


7. Effect on communalities —The formula— 


is thus a very useful formula, including partial correlation 
as a special case. If the original variances are each taken 
as unity, the numerator Ry — iq for i + j gives the new 
covariances, while p,* and ., are the new variances. 

It also includes as a special case the formula known as 
the Otis-Kelley formula, which is applicable when two 
variates have both shrunk to the same extent (a restriction 
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not always recognized). If we put q; = q; and therefore 
Pi = p; it becomes— 


pry; = Ry g = Ry —1-+ p? 

pl — ry) =1— Ry 

Ey =p = 2 eet. the Otis-Kelley formula. 

l—w% 2? $ 2 

It has a still further application (Thomson, 19380, 456), 

for if a matrix of correlations in the wider population has 
been analysed by Thurstone’s process, this same formula 
gives the new communalities (with one exception) to be 
expected in the sample, if we put i = j and understand by 
Rx the communality in the wider population, by vu, the 
communality in the sample (and not a reliability coefficient, 
which is the usual meaning of this symbol). Writing the 
usual symbol h? for communality we have the formula in 
the form— 


2 
0) 


The exception is the new communality of the trait or 
quality which has been directly selected, in our example 
No. 1 the Stanford-Binet scores. For the directly selected 
trait the new communality is given by— 


9 pH 
h 1 — q’H,? 


(Thomson, 1938b, 455; and see also Ledermann, 19380). 
With these formulæ we can see what is likely to happen 
to a whole factorial analysis when the persons who are the 
subjects of the tests are only a sample of the wider popula- 
tion in which the analysis was first made. 

8. Hierarchical numerical evample.—We shall take, in 
the first place, the perfectly hierarchical example of our 
Chapter I. But to save space in the tables we shall con- 
sider only the first four tests. Their matrix of correlations, 
with the one common factor and the four specifics added, 
and with communalities inserted in the diagonal cells, was 
as follows : 
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1 . 2 3 AE Sı Sa 83 84 
1 | (81) 72 66 54 90 44 
2 72 (64) 56 4s 80 60 
3 63 56 (49) 42 70 . acre 3 
4 54 48 42 (6) 6 : à 80 
0 50 0 60 |100 . 
51 14 : | 1:00 
82 ; G0. ics : - . 1:00 3 
85 : 5 1 9 „ 0 
AEE = 5 807, aa : ; 190 


The bottom right-hand quadrant shows, by its zero 
entries, that the factors are all uncorrelated with one 
another, that is, orthogonal. The tests expressed as linear 
functions of the factors are— 


21 = "9g + 43061 
22 = “8g + 60082 
2, = Tg + 714835 
z4 = 6g + 80084 


Il 


These equations are only another way of expressing the 
same facts as are shown in the north-east, or the south- 
west, quadrant of the matrix (where only two places of 
decimals are used for the specific loadings, to keep the 
printing regular). 

Let us now suppose that this matrix and these equations 
refer to a wide and defined population, e.g. all Massa- 
chusetts eleven-year-olds, and let us ask what will be the 
most likely matrix of correlations between these tests and 
factors to be found in a sample chosen by their scores in 
Test 1 so as to be more homogeneous. The variance of 
Test 1 in the wider population being taken as unity, let 
us take that in the more homogeneous select sample as 
being pi: = 36. We then have, using 4; Ru, and 
treating g and the specifies just like tests, the following 


table : 


1 2 3 4 | g Sı 82 83 84 
S a 
q 80 576 504 432 720 340 
P 60 ‘817 864 902 694 937 1 1 1 
p? (variance) 36 668 746 813 482 878 1 1 1 


286 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 


For the correlations and communalities, using our 
formula— 


By — 409 
Dab, 
we get (again printing only two decimal places) : 
1 2 8 4 g 81 Sy 83 S 
1 | (61) -58 44 36 78 286 
2 53 (46) 38 31 68 26 76 
3 ‘44 38 (32) -26 es 
4 36 31 26 (21) 40 —18 . Mah 
g 78 68 -56 46 | 1:00 —-39 
8, 28 —26 —22 —18 |—39 100 
62 r 7 ig 1.00 
Ss 7 ‘ 83 : | $ > . 1:00 2 
s 0 ë ; 89 1:00 


In the more homogeneous sample, therefore, the 
correlations and the communalities of all the tests have 
sunk. The g column shows what the new correlations of g 
are with the tests; and on examination of the matrix we 
see that these, when cross-multiplied with one another, 
still give the rest of the matrix. Thus— 


78 X 46 = -36 (714) 
68 -46 (25) 
The test matrix is still of rank 1 (Thomson, 19380, 453), 
and these g-column entries can become the diminished 
loadings of the single common factor required by rank 1. 

The columns for the specifics 52, 53 (and later specifies 
also) still show only one entry. In the bottom right-hand 
quadrant, zero entries show that these specifics are still 
uncorrelated with one another and with g, that is, g, S2, 8g, 
and s, are still orthogonal. 

But something has happened to the specific s1. It has 
become correlated with g, and with all the tests. It has 
become an oblique factor, orthogonal still to the other 
specifics, but inclined to g and the tests. It leans further 
away from Test 1 than it formerly did, and makes obtuse 
angles (negative correlation) with the other tests and with g, 
to which it was originally orthogonal. 
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But since, as we have already pointed out, the test matrix 
with the reduced communalities is still of rank 1, it is 
clear that a fresh analysis could be made of the tests into 
one common factor and specifics, thus— 


21 = 7789 + -628s,' 
2 = 679g" + 7346, 
23 = 562g’ + 82783 
24 4629 + -8875, 


In these equations the factors g“, 5’, Sa Ss, and s4 are 
again orthogonal (uncorrelated), and the loadings shown 
give the correlations and give unit variances. This is the 
analysis which an experimenter would make who began 
with the sample and knew nothing about any test measure- 
ments in the whole population. 

The reader, comparing the loadings in these equations 
with the correlations in the matrix of the sample, will 
rightly conclude that the specifics from s, onward have not 
changed. In the matrix it is clear that they are still 
orthogonal, and their correlations with the tests, in the 
matrix, are the same as their loadings in the equations. 
The tests are, in the sample, more heavily loaded with these 
specifics than they were in the population, but the specifics 
are the same in themselves. 

The new specific si“ the reader will readily agree to be 
different from s, The latter became oblique in the 
sample, whereas s’ is orthogonal. What now is to be said 
about the common factors g (in the population) and g’ (in 
the sample) ? From the fact that the loadings of g’, in the 
sample equations, are identical with the correlations in 
the sample matrix of the original g with the tests, one is 
tempted to imagine g and g to be identical in nature. But 
that is not so certain. 

If we go back to the equations of the tests in the popu- 
lation, we can rewrite them in the following form— 


21 = -467g' + -800g" + 8775)’ 
za 5559 + 576g" + 6008. 
za = 485g" + 5049“ + 71483 
2 417g + 4329“ + 8005. 


I 


ll 
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with two common factors g' and g” instead of one common 
factor g. These equations still give the same correlations. 
For example A 


Tja = 407 X ‘417 + -800 X -482 = -540 as before. 


In these equations the specifics 82, Sẹ, $4 are the same, and 
the communalities of Tests 2, 3, and 4 are the same. All 
that we have done in these three tests is to divide the 
common factor g into two components. The ratio of the 
loading of g“ to the loading of g“ is the same in each of 
them. The loadings of g" we have made identical with the 
shrinkages q in the table on page 285. a 

In Test 1 also we have made the eye of g” equal to A 


upon merely as a a component of g. To give the corre ii 2 
correlations, the loading of g’ has to be -467 as shown, and 
the communality of Test 1 has been raised from its forma 
value (81) to E 
407 + -8002 = -858 3 


while the loading of the specific has correspondingly sunk. 
The factors g, g", and si“ are a totally new analysis of A 
Test 1 in the population. Part of the former specific has 
been incorporated in the common factors. 

Now let the factor g” be abolished, i.e. held constant, so 
that the tests (now of less than unit variance, so we write 
them with instead of z) are— 


Variances j 
a = 467g + 377811 360 i 
w, = 555g" + -600s, 668 
43 = 485g" + 714% 746 3 
æ, 417g + -800s, 818 4 


The reduced variances are the sum of the squares of the 
surviving loadings, e.g.— 3 


467° 3772 = 60 ¥ 


The variances, it will be seen, are the p?’s of our tests 
as measured in the sample. If each of the last set f 
equations is divided through by the square root of its 
variance, we arrive at the equations— 
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21 = 778 + 62851 

22 679g + 73452 

£ -562g' + 82783 

4 = 462g’ + -8875, 
which is the analysis already given as that of an experi- 
menter who knew only the sample. As to the nature of g’, 
we can say in Tests 2, 8, and 4 that it is possible to regard 
it as a component of the g of the population. But we 
cannot do so with assurance in Test 1. There its nature is 
more dubious. At all events, it is not the same common 
factor as in the population, and at best we can say that it 

is one of its components. 

9. A sample all alike in Test 1.—These phenomena are 
still more striking if we consider a case where the sample 
is composed of persons who are all alike in Test 1. It 
would be an excellent exercise for the reader to calculate 
the resulting matrix of correlations for tests and population 
factors in this case. The tests act in this case as though 
their original equations in the population had been— 


n 


x 2 
08 
ll tt 


81 

za 349g + -720g" + -6005, 

za = -B05g" + -680g" + 71463 

z4 2629 + -540g" + 8008. 
and then g“ had become zero, i.e. a constant with no 
variance. 

It perhaps helps to a further understanding of what is 
happening to the factors during selection if we realize that 
holding the score of Test 1 constant does not hold its factors 
g and s, constant. They can vary in the sample from 
man to man, but since 

21 = 9g + -486s, 

remains constant, a man in the sample who has a high g 
must have a low s,—that is, these factors are negatively 
correlated in the sample. And because they are thus 
negatively correlated, those members of the sample who 
have high g’s, and who will therefore tend to do well in 
Tests 2, 3, and 4, will tend to have values below average 
(negative values) for their sy which will be therefore 
negatively correlated with these tests, in this sample. 


F.A —10 
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So far in our examples we have assumed the sample to 
be more homogeneous than the population. But a sample 
can be selected to be less homogeneous. In such a ease 
the same formule will serve, if we simply make the capital 
letters refer to the sample and the small to the population. 
In fact, the same tables, with their rôles reversed, can — 
illustrate this case. In practical life we usually know whieh 
of two groups we would call the sample, and which the 
population. But mathematically there is no distinction, ` 
the one is a distortion of the other, and which is the“ true” 
state of affairs is a question without meaning. 3 

It must also throughout be remembered that all these 


are certain to follow, but to consequences which are to be £ 
expected. If actual samples were made the values experi- 


loadings, etc., would oscillate about those given by our 
formule, violently in the case of small samples, only ~ 
slightly in the case of large samples. 

10. An example of ran 2. The above example has only — 
one common factor. We turn next to consider an example 
with two. Again it is, we suppose, the first test according 
to which the sample is deliberately selected, and again 
we suppose the “shrinkage” g, to be -8. The matrices 
of correlations and communalities, in the population and 
in the sample, are then as follows, the two factors fı and fa 
and the specifics being treated in the calculation exactly — 
as if they were tests. To economize room on the page, 
we omit the later specifics : 


Correlations in the Population 


„ „ 
1 (65) 46 -59 -36 41 70 AO 59 ` 
2 46 (37) 36 26 23 60 10 ` 79 
3 | 59 836 (61) -32 45 50 60 
4 36 26 32 (-20) 22 40 20 
5 | 41 286 45 22 (84) 30 -50 
fi | -70 -60 50 40 30 (1-00) : 
fa 40 40 -60 20 -50 (4.000 E 
s | 59 8 . 8 (1-00) 
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Correlations in the Sample 


1 2 8 4 5 71 te 81 82 
1 | (40) 30 40 23 26 51 25 40 
2 30 (27) 23 17 12 51 —-02 —.21 85 
3 40 -28 (50) 22 35 -82 +54 —-29 
4 23 17 22 (18) 14 30 12 —16 
5 26 12 35 14 (26) 415 44 —-10 
„ 351 1 32 0 45 | (1-00) —-28 —-B6 
Ja 25 —02 54 12 44 | —-283 (1-00) —.18 
sı 40 —21 —29 —16 —-19 —.36 —-18 (1-00) 
9% oe BD ee : Tele eee 5 . (1-00) 


We see here a new phenomenon. The two common 
factors f, and f, in the population were orthogonal to one 
another, as is shown by the zero correlation between them. 
But in the sample they are negatively correlated (— 228); 
that is, they are oblique. We begin to see a generalization 
which can be algebraically proved, that all the factors, 
common and specific, which are concerned with the directly 
selected test(s) become oblique to each other and to all the tests, 
but the specifies of the indirectly selected tests remain orthogonal 
to everything, except each to its own test. 

But the matrix of the tests themselves is still of rank 2, 
and an experimenter working only with the sample would 
find this out, although he would know nothing about the 
population matrix. He would therefore set to work to 
analyse it into two common factors, orthogonal to one 
another. A Thurstone analysis comes out in two common 
factors exactly, and can be rotated until all the loadings 
are positive. For example: 


Test | 1 2 3 4 5 


Factor f | 570 521 436 332 238 
Factor fy’ 276 8 555 130 452 


These factors f’, however, are clearly a different pair 
from the factors f in the original population. In the 
sample, those original factors (f) are oblique; these (0 
are orthogonal. 

Again the whole phenomenon is reversible. The second 
matrix (with the orthogonal factors f’) might refer to the 
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population, and a sample picked with a suitable increased 
scatter of Variate 1. All our formule could be worked 
backwards, and we should arrive at the matrix beginning 
(-65), referring now to the sample. The f’ factors would 
have become oblique, and a new analysis, suitably rotated, 
would give us the other factors f. 

It becomes evident that the orthogonal factors we obtain 
by the analysis of tests depend upon the subpopulation we 
have tested. They are not realities in any physical sense 
of the word; they vary and change as we pass from one 
body of men to another. It is possible, and this is a hope 
hinted at in Thurstone’s book The Vectors of Mind, that if 
we could somehow identify a set of factors throughout all 
their changes from sample to sample (in most of which 
they would be oblique) as being in some way unique, we 
might arrive at factors having some measure of reality 
and fixity. Thurstone, in his latest book Multiple Factor 
Analysis, believes that he has achieved this, and that his 
oblique Simple Structure is invariant. His claim is con- 
sidered in our next chapter. It is, in the present writer's 
opinion, justifiable only for univariate selection, not for 
multivariate, which is not merely repeated univariate 
selection. 

11. Random selection. These considerations deal with 
the results to be expected when a sample is deliberately 
selected so that the variance of one test is changed to some 
desired extent. The new variances and the changed 
correlations of the other tests given by our formula— 


Ry — geh 
Pi 

are not the certain result of our action in selecting for Test 1. 
If we selected a large number of samples of the same size, 
all with the same reduced variance in Test 1, they would 
not all be alike in the resulting correlations. On the con- 
trary, they would all be different.. But most of them would 
be like the expected set, few would depart widely from that ; 
and the departures would be in both directions, some 
samples lying on the one side, others on the other side, 
of our expectation. 


i — 
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If now, instead of selecting samples which are all alike 
in the variance of one nominated test, we take a large 
number of random samples of the same size, what would we 
find? Among them would be a number which were alike 
in the variance of Test 1, and these in the other part of 
the correlation matrix would have values which varied 
round about those given by our formula. We could also 
pick out, instead of a set all alike in the variance of Test 1, 
a different set all alike in the variance of Test 4, say ; 
and these would have values in the remainder of the matrix 
oscillating about our formula, in which Test 4 would replace 
Test 1. In short, a complex family of random samples 
would show a structure among themselves such that if we 
fix any one variance the average of that array of samples 
obeys our formula.* Random sampling will not merely 
add an “ error specific ” to existing factors, it will make 
complex changes in the common factors. 

* On the author's suggestion, Dr. W. Ledermann has since 


proved this conjecture analytically (Biometrika, 19394, 30, 295- 
304). His results cover also the case of multivariate selection (see 


next chapter). 


CHAPTER XIX 


THE INFLUENCE OF MULTIVARIATE 
SELECTION * 


1. Altering two variances and the covariance.—In the pre- 
ceding chapter we have discussed the changes which occur 
in the variances and correlations of a set of tests, and in 
their factors, when the sample of persons tested is chosen 
according to their performance in one of the tests: we 
are next going to see the results of picking our sample by 
their performances in more than one of the tests, first of 
all in two of them. Take again, the perfectly hierarchical 
example of the last chapter. We must this time go as far 
as six tests in order to see all the consequences. The matrix 
of correlations of these tests and their factors will be 
simply an extension of that printed on page 285. 

Now let us imagine a sample picked so that the variance 
of Test 1 and also that of Test 2 is intentionally altered, 
and further, their covariance (and hence their correlation) 
changed to some predetermined value. 

Tt is at once clear that in these two directly selected 
tests the factorial composition will in general be changed 
—can indeed be changed to anything which is not incom- 
patible with common sense and the laws of logic. What, 
however, will be the resulting sympathetic changes in the 
variances and covariances of the other tests of the battery ? 

In Chapter XVIII we altered the variance of Test 1 from 
unity to -36. The consequent diminution in variance to be 
expected in Test 2 was, as is shown on page 285, from 
unity to -668, and the consequent change in correlation 
from -72 to ·53. Here, however, let us pick our sample so 
that the variance of the second test is also diminished to 
36, and so that the correlation between them, instead of 
falling, rises to -833. We have, that is to say, chosen 
people for our sample who tend to be rather more alike 

Thomson, 1937; Thomson and Ledermann, 1938. 
294 
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than usual in these two test scores, as well as being closely 
grouped in each, an unusual but not an inconceivable 
sample. Natural selection (which includes selection by the 
other sex in mating) has no doubt often preferred indi- 
viduals in whom two organs tended to go together, as 
long legs with long arms, and the same sort of thing might 
occur in mental traits. In terms of variance and covariance 
we have changed the matrix : 


1 2 
2 
1 1.00 72 _p 
2 %% 100 

to the matrix: 

1 2 
1 36 30 
2 30 3 Vip 


-80 5 

for = 
(36 X 36) 6 
that the diagonal entries here (unities in R., and 36, 36 

in V,,) are the variances, not the communalities. 

2. Aitken’s multivariate selection formula.—We shall 
symbolically represent the whole original matrix of vari- 
ances and covariances by : 4 


= -833, the new correlation. Notice 


| 

Ry | Ry 
— 
R le 


where the subscript p refers to the directly selected or 
picked tests, and the subscript q to all the other tests and 
the factors. R, (and also R) means the matrix of co- 
variances of the picked tests with all the others, including 
the factors. R% means the matrix of variances and co- 
variances of the latter among themselves. Since at the 
outset the tests and factors are all assumed to be stan- 
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dardized, the variances in this whole R matrix are all 
unity, and the covariances are simply coefficients of 
correlation. In our case the R matrix is: 


Analysis in the Population 
w e 


— 2 

1 [100 72 63 54 45 36 90 -44 

2 72 1-00 | 56 48 40 32 -80 60 

S1 08 56/100 4a 85 28 70 . . Nn. 

4| 54 48 42 1:00 80 24 60 . . 80 

5 45 40 388 80 100 20 0 . . ‘87 . 
8 8 e o 40 . . 992 
2 90 80 7% 60 +50 40 100 

51 44 2 e 100 

9 60 5 è 100 . 

Sa era es 1-00 

&). | 80 % $ i 8 1:00 

8 Es Gas oes z 8 A 8 100 
S|. 10700 


The R,, matrix is the square 2 x 2 matrix, the R,, matrix 
the square 11 x 11 matrix, while R,, has two rows and 
eleven columns, N being the same transposed. 

Our object is to find what may be expected to happen 
to the rest of the matrix when R, is changed to V,,. 
Formulæ for this purpose were first found by Karl Pearson, 
and were put into the matrix form in which we are about 
to quote them by A. C. Aitken (Aitken, 1934). The matrix 
changes to : 

er | 2 Vy, , ; Ry 


Ry Ro V, | Ry —R, (F, 5 Vy» N, Ry, 
and in order to explain the meaning of these formule we 
shall carry out the calculation for a part of the above matrix 
only (the first four tests), with a strong recommendation to 
the reader to perform the whole calculation systematically. 
Tf we confine ourselves to the first four tests we have— 


1.60 7 
By -Í 72 Boo 


B 1.00 42 
42 1-00 
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63 54 
Ry = | 56 48 | 
63 56 
Balss [ 54 48 | 
The most tiresome part of the calculation, if the number 


of directly selected tests is large, is to find R - the reci- 
procal of the matrix Rpp such that the produet 


= 1 s 
Ra gare l=: 


where I is the so-called “ unit matrix” which has unit 
entries in the diagonal and zero entries everywhere else. 
The method of doing this is given in Chapter XIV, 
Section 9, page 210. In the present example, where Rpp 
is only of dimensions 2 X 2, we soon find— 
B oe 2.0764 — 1:4950 
pp = | — 1:4950 2:0764 
When the reciprocal matrix R, has thus been calculated, 
the best way of proceeding is to find— 
C= E, Ry 
and D = R — H C 


In the case of our example these are— 
cul. 20% 1.4950 J [-63 54] [470 4037 
1.4050 2.0764] | -56 48] 2209 4694 
e 63 56 700 4037 
=| 42 1:00 54 -48 | | 2209 1894 
1.00 42 40 3400 
=| 42 1:00 3604 3089 
_ [5796 0596 
=| 0596 6911 
subtraction of matrices being carried out 
each element from the corresponding one. 


e As 30 J [4709 4037] 2858 2022 
„0 =| s0 36 | -2209 1894 | 2208 1893 
e new covariances of the directly selected 


tests with those indirectly selected. For Vy we need still 
CV) where the prime indicates that the matrix is 


transposed (rows becoming columns)— 


F.A.—10* 


by subtracting 
We next need 


which gives us th 


; _ [4709 -2209] [2358 -2022] _[-1598 -1370 
co =| owe sos] L J [iso ans] 


and then— 


2208 1893 


1 
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1870 -1175 | 

| 


7 : A 15 187 
Pie Dit oe ue aed es 13 1 


0596 6911 1370 4175 
= 73904 1966 
1966 8086 


We now can write down the whole new 4 x 4 matrix 
of variances and covariances. In the same way, had we 
included the other tests and the factors, we would have 
arrived at the whole new 13 x 18 matrix for all the 
variances and covariances which we now print.* The 
values calculated above for the first four tests will be 
recognized in its top left-hand corner. (The diagonal 
entries are variances, not communalities.) 


Covariances in the Sample 
8 4 5 6 g s „ B 8. wu 


ler e . « if | 
0s 800 g 
16 14 87 09 28—10-—05 . 87 
s be 
38 28 23 19 47 —-19 —-10 o's a 


—14 —12 —-10 —-08 —-19 70 -32 
05 18 |—-07 —-06 —-05 —-04 —-10 382 43 
ace 71 . . ’ 4 - 1:00 


8 AN Boy eRe $ : s 1500 
EEY : e : y 100% 
Oe . : 8 . 3 


8. Features of the sample covariances. Examination of 
this matrix shows the following features : 

(1) The specifies of the indirectly selected tests have 
remained unchanged. They are still orthogonal to each 
other and all the other tests and factors (except each to 


* In such calculations on a larger scale, the methods of Aitken’s 
(1987) paper are extremely economical. Triple products of 
matrices of the form XV can thus be obtained in one pivotal 
operation (see Appendix, paragraph 12). 
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its own test), are still of unit variance, and have still the 
same covariances with their own tests, though these will 
become larger correlations when the tests are restan- 
dardized ; 

(2) The specifics of the directly selected tests have 
become oblique common factors, correlated with everything 
except the other specifies ; 

(3) The matrix of the indirectly selected tests is still of 
the same rank (here rank 1) ; 

(4) The variances of the factors g, sy and s, have been 
reduced to -47, -70, and 43. 

An experimenter beginning with this sample, and 
knowing nothing about the factors in the wider population, 
would have no means of knowing these relative variances, 
and would no doubt standardize all his tests. He certainly 
would not think of using factors with other than unit 
variance. And even if he were by a miracle to arrive at 
an analysis corresponding to the last table, with three 
oblique general factors, he would reject it (a) because of 
the negative correlations of some of the factors, and 
(b) because he can reach an analysis with only two common 
factors, and those orthogonal. It is therefore practically 
certain that he will not reach the population factors, at 
least as far as the directly selected tests are concerned. 
His data and his analysis will be as overleaf. The variances 
are all made unity and the covariances converted into 
correlations. The analysis into factors is a new one, not 
derived from the last table. 

4. Appearance of a new factor.—The most noticeable 
change in this sample analysis, as compared with the 
population analysis on page 296, is the appearance of a 
new “ factor ” h linking the directly selected tests, a factor 
which is clearly due entirely to that selection. What 
degree of reality ought to be attributed to it? Does it 
differ from the other factors really, or have they also been 
produced by selection, even in the population, which is 
only in its turn a sample chosen by natural selection from 
past generations 7 

Otherwise the analysis is still into one common factor 
and specifies. The loadings of the common factor are 
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Analysis in the Sample 


1 2 I77ü Scat 618” 8. 8, 8, 8 
1 1·00 83 -46 38 -30 24 -82 -45 -35 
2 | 881-00) 43 35 -28 22 -77 -45 46 
8 46 43 1-00 -26 21 -16 -56 83 
4 | '38 -35| -261-00 17 -13 -46 89 
5 30 28 21 17100 11 37 > ; > n 9805. 
624 22 16 -18 11100 29 . ‘ rE. e „ 90 
882 77 56 46 87 29 100 S 5 ‘ 
h| 45 45 . . OO). 
1 88 < = 100 . 
ä ST 
Ss Se T: DDES E 1:00 . 
81 | 89 . 1-00 
85 5 8 1-00 
Se i 96 1:00 


less than they were in the population, and this, as our table 
of variances and covariances shows, is due to a real 
diminution in the variance of the common factor. The 
new common factor g’ is a component of the old one. 

The loadings of s, and s, have also sunk, because they 
have been in part turned into a new common factor. The 
‘loadings of the other specifics have risen. But this is 
entirely because the variance of the tests has sunk due to 
the shrinkage in g, and is not due to any new specifics 
being added. 

All these considerations make it very doubtful indeed 
whether any factors, and any loadings of factors, have 
absolute meaning. They appear to be entirely dependent 
upon the population in which they are measured, and for 
their definition there would be required not only a given 
set of tests and a given technical procedure in analysis, but 
also a given population of persons. 

Professor Thurstone, however, in his new book M ultiple 
Factor Analysis (1947) gives what he mildly calls “a less 
pessimistic interpretation than Godfrey Thomson’s of the 
factorial results of selection.” 

5. Identity of simple structure factors after univariate 
selection.—In that book, Thurstone discusses in Chapter 
XIX the effects of selection, and shows by examples that 
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if a battery of tests yields simple structure with oblique 
factors (including, of course, the orthogonal case), then 
after univariate selection the same factors (though at new 
angles with one another) are identified by the new structure, 
which is still simple. 

If, for example, the battery which gives the correlations 
on our page 152, and yields Figure 26 on page 158, has the 
standard deviation of Test 2 reduced to one-half, then by 
the methods described on our pages 296-8 we can calculate 
that the matrix of correlations and communalities becomes : 


1 2 3 4 5 6 
1 589 295 — 044. — 140 366 000 
2 295 302 049 159 183 000 
3 — 044 049 555 115 304. 506 
4 — 140 159 115 371 — 087 000 
5 366 183 304 — 087 439 322 
ori 000 000 506 000 322 493 


The rank of this matrix is still 3 as it was before selection, 
and three centroid factors are found to have loadings— 


I II III 


379 244 — 315 
569 — 444 184 
160 — 271 — 522 
585 174 257 


| 
| 
409 647 058 
506 — 350 337 


= © De 


When these are “ extended ” in the manner of our page 157 
and a diagram like Figure 26 made, we obtain Figure 32. 
Tt is still a triangle, and although its measurements are 
different, the same tests are found defining each side as 
before. The corners of the triangle may, with Professor 
Thurstone, reasonably be claimed to represent the same 
factors as before selection, although their correlations have 


changed. 
The plane of 
Figure 26, being at rig 


Figure 82 is not the same as the plane of 
ht angles to a different first centroid. 
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When adjustment is made for this, as Professor Thurstone 
has presumably done in his chapter (though, I protest, 
without sufficient explanation), then the directly selected 


Figure 32. 


test point has not moved, while the other points have 
moved radially away from or towards it. 
If the above matrix of centroid loadings is postmulti- 


plied by the rotating matrix obtained from the diagram, 
viz.— 


721 443 “641 
— 499 — -201 744 
480 — 874 — 190 


we obtain the new simple structure on the reference vectors, 


A B Cc 
1 = 732 
2 394 484 
3 562 180 
: 4 . 472 ` 
5 459 5 455 
6 702 


— 


2 — E 


1 
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If this is compared with the table on page 154 it will be 
seen that the zeros are in the same places, although the 
non-zero entries have altered (except in Test 6, which was 
uncorrelated with the directly selected Test 2, and therefore 
is unaffected in composition). 

If the correlations between the factors are calculated by 
the method of pages 181-2, factor A is found to be still 
uncorrelated with B and C, but these last two have a 
correlation coefficient of — ‘3 : that is, they are no longer 
orthogonal but at an obtuse angle of about 1074“. 

6. Multivariate selection and simple structure.—But 
though Thurstone must, I think, be granted his claim that 
univariate selection will not destroy the identity of his 
oblique simple structure factors, but only change their 
intercorrelations, the situation would seem to be very 
different with multivariate selection. 

Multivariate selection is not the same thing as repeated 
univariate selection. The latter will not change the rank 
of the correlation matrix with suitable communalities, nor 
will it change the position of zero loadings in simple struc- 
ture. Repeated univariate selection will, it is true, cause 
all the correlations to alter, but only indirectly and in such 
a way as to preserve rank, simple structure, and factor 
identity. 

But in multivariate selection it is envisaged that the 
correlation between two variables may itself be directly 
selected, and caused to have a value other than that which 
would naturally follow from the reduction of standard 
deviation in two selected variables. Selection for correla- 
tion is just as easily imagined as is selection for scatter. 
Indeed, in natural selection it is possibly even commoner. 

Once we select for the correlations, however, as well as 
for scatter, new “ factors ” emerge, old ones change. In 
this chapter we have supposed a small part Rpp of the whole 
correlation matrix to be changed to Vy», and found that 
one new factor is created (page 300) or, indeed, two new 
oblique factors (page 298). We might have supposed R to 
be a larger portion of R: and there is nothing to prevent 
us supposing selection to go on for the whole of R, and 
writing down a brand-new table of coefficients whose 
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factors would be quite different from those of the origi- 
nal table. In our example of page 152, for instance, 
where the three oblique “ factors“ coincided in direction 
with the communal parts of Tests 1, 4, and 6, there is 
nothing to prevent us from writing down, as having 
been produced by selection, a new set of correlation coeffici- 
ents whose analysis would identify the “ factors ” with the 
communal parts of Tests 2, 3, and 5. In fact, all we would 
have to do would be to renumber the rows and columns on 
page 152. Such fundamental changes could be produced 
by selection: and perhaps they have been, for natural 
selection has had plenty of time at its disposal. 

Professor Thurstone (his page 458, footnote, in Multiple 
Factor Analysis) classes the new factors produced by 
selection as “ incidental factors (which) can be classed 
with the residual factors, which reflect the conditions of 
particular experiments.” But we can hardly dismiss 
them thus easily if, as is conceivable, they have become 
the main or perhaps the only factors remaining, the others 
having disappeared ! 

Tt may be admitted at once, however, that the actual 
amount of selection from psychological experiment to 
psychological experiment is not likely to make such 
alarming changes in factors. For the use to which factors 
are likely to be put in our age, in our century or more, they 
are like to be independent enough of such selection as can 
go on in that time, and in that sense Professor Thurstone 
is justified in his thesis. Nor am I one to deny “ reality ” 
to any quality merely because it has been produced by 
selection, and may not abide for all time. 


CHAPTER XX 
/ THE SAMPLING THEORY 


1. Two views. A hierarchical example as explained by one 
general factor. The advance of the science of factorial 
analysis of the mind to its present position has not taken 
place without controversy, and it is the purpose of the pre- 
sent chapter to give a preliminary deséription of some 
objections which have been frequently raised by the 
present writer (Thomson, 1916, 1919a, 19350, etc.) which 
he still holds to. 

The contrast between the factorial point of view and 
Thomson’s sampling theory can be best seen by consider- 
ing the explanation of the same set of correlation coefficients 
by both views. To simplify the argument we shall take 
in the first place a set of correlation coefficients whose 
tetrads are exactly zero, which can therefore be completely 
“ explained ” by a general factor g and specifies, as in this 
table : 


1 2 3 4 
Ee) eS ee 
1 ` 746 646 527 
2 746 577 47¹ 
8 646 577 . 8 
4 527 "471 408 


We can more exactly follow the argument if we employ 
the vulgar fractions of which these are the decimal 
equivalents, namely the following, each divided by 6: 


| 1 2 3 4 
V 
3 15 4/12 3 


4 ie 4/8 V 


In this form the tetrad-differences are all obviously zero 
by inspection. These correlations can therefore be ex- 
307 
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plained by one general factor, as in Figure 83, which gives 
them exactly. 


We have here a general factor of variance 30 which is 
the sole cause of the correlations, and specific factors of 


-A 
60) (72) 


(54) 


(90) * (36) 
Figure 33, Figure 34. 


variances 6, 15, 30, and 60. The variances of the four 


“ tests ” are 86, 45, 60, and 90. The communalities ” 
and “ specificities ” are: 
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These communalities can be calculated from the corre- 
lation coefficients, for it will be remembered (Chapter I, 
Section 4) that @when tetrad-differences are exactly zero, 
each correlation coefficient can be expressed as the 
product of two correlation coefficients with g (two 
“ saturations ”). Thus 


71271020 
113 = IU 
Tas = Tof 3y 


Therefore— 
ratis (Tigro) (riga) _ ar 
725 C 5 
the square of the saturation of Test 1 with g. And when 
there is only one common factor, the square of its satura- 
tion is the communality. 

The quantity 72713/7285, therefore, means, on this theory 
of one common factor, the communality, or square of the 
saturation with g, of the first test. Its value in our 
example is 30/36, or five-sixths. 

2. The alternative explanation. The sampling theory. 
Ahe alternative theory to explain the zero tetrad- 
differences is that each test calls upon a sample of the bonds 
which the mind can form, and that some of these bonds are 
common to two tests and cause their correlation. In the 
present instance we have arranged this artificial example 
so that the tests can be looked upon as samples of a very 
simple mind, which can form in all 108 bonds (or some 
multiple of 108).* The first test uses five-sixths of these 
(or 90), the second test four-sixths (or 72), the third three- 
sixths (54), and the fourth two-sixths (or 36). These 
fractions are the same in value as the communalities of 
the former theory. Each of them may be called the 
“ richness of the test. Thus Test 1 is most rich, and 
draws upon five-sixths of the whole mind. The fractions 
u: which in the former theory were “ eommunali- 
are in the sampling theory “ coefficients of rich- 


rah 
ties, 
terious about the number 108. It is 


* There is nothing mys 
chosen merely because it leads to no fractions in the diagram. 


Any large number would do. 
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ness.” They formerly indicated the fraction of each test’s 
variance supplied by g; they indicate here the fraction 
which each test forms of the whole “ mind“ (but see later, 
concerning sub-pools ”). 

Now if our four tests use respectively 90, 72, 54, and 36 
of the ãvailable bonds of the mind, as indicated in Figure 
84, then there may be almost any kind of overlap between 
two of the tests. Any of the cells of the diagram may have 
contents, instead of all being empty except for g and the 
specifies. If we know nothing more about the tests except 
the fractions we have called their “ richnesses,”” we cannot 
tell with certainty what the contents of each cell will be; 
but we can calculate what the most probable contents will 
be. If the first test uses five-sixths and the second test 
four-sixths of the mind’s bonds, it is most probable that 
there will be a number of bonds common to both tests 


5 4 ` 
equal to 8 X 6 or 20/36ths of the total number. That is, 


the four cells marked a, b, c, d in the diagram, the cells 
common to Tests 1 and 2, will most likely contain— 


20 
—— 1 = 
56 x 108 = 60 bonds 


between them. By an extension of the same principle we 
can find the most probable number in each cell. Thus c, 
the number of bonds used in all four of the tests, is most 
probably— 


5 2 
é * 6 X 108 = 10 bonds. 


In this way we reach the most probable pattern of 
overlap of the four tests shown in Figure 35. And this 
diagram gives exactly the same correlations as did Figure 33. 
Let us try, for example, the value of Tə in each diagram. 
In Figure 83 we had— 

30 V2 


ale 


x 4 
6 


hs = ~ . 77 
* 5 X 60) 6 8 
In Figure 35 the same correlation is— 
20 ＋ 10 ＋ 4 ＋＋ 2 12 
e 577 


s V(72 xX 54) 6 
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This form of overlap, therefore, will give zero tetrad- 
differences, just as the theory of one general factor did. 
More exactly, this sampling theory gives zero tetrad- 
differences as the most probable (though not the certain) 
connexion to be found between correlation coefficients 
(Thomson, 1919a) if the sampling of causes is random. 

If we let P Po Ps, and p, represent fractions which the 
four tests form of the whole pool of N bonds of the mind, 
then the number common to the first two tests will most 
probably be piNV, and the correlation between the tests 


We therefore have, in any tetrad, quantities like the 


following : 
3 4 


1 VPiPs VPID 
2 Paps VPaPa 


and the tetrad-difference is, most probably (Thomson, 
1927a, 253)— 

V/PPaPaPa — VPP PPs = 0 

This may be expressed by saying that the laws of proba- 
bility alone will cause a tendency to zero tetrad-differences 
among correlation coefficients.) In another form this 
statement can be worded thus: The laws of probability or 
chance cause any matrix of correlation coefficients to tend 
to have rank 1, or at least to tend to have a low rank (where 
by rank we mean the maximum order among those non- 
vanishing minors which avoid the principal diagonal 
elements). 

It is, in the opinion of the present writer, this fact—a 
result of the laws of chance and not of any psychological 
laws—which has made conceivable the analysis of mental 
abilities into a few common factors (if not into one only, 
as Spearman hoped) and specifies. Because of the laws 
of chance the mind works as if it were composed of these 
hypothetical factors g, b, n, ete., and a number of specific 
factors. The causes may be “anarchic,” meaning that 


* 
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they are numerous and unconnected, yet the result is 
“monarchie,” or at least “ oligarchic,” in the sense that 
it may be so described—provided always that large specific 
factors are allowed.) 

3. Specific factors mavimized.—The specific factors play, 
in the usual methods of factorization, an important réle, 
and our present example can be used to illustrate the fact, 
which is not usually realized, that all these methods 
maximize the specifics (Thomson, 1988c) by their insistence 
on minimizing the number of common factors. In Figure 
88, of the whole variance of 4, the specific factors contribute 
1-667, or 41-7 per cent. In Figure 35, they contribute 
only— 

= a5 75 15 50 +Š = 10050 2315, or 5:8 per cent. 

Apart from certain trivial exceptions which do not occur 
in practice, it is generally true that minimizing the number 
of common factors maximizes the variance of the specifies. 
Innumerable other equivalent analyses of the above cor- 
relations can be made, but they all give a variance to 
the specifies which is less than 1-667. Here, for example, 
in Figure 36 (page 308), is an analysis Which has no general 
factor but six other common factors, and which gives a 
total specific variance of — 

50 5 = + 55 ＋ 0 1680 3056, or 7-6 per cent. 

Now( specific factors are undoubtedly a difficulty in any 
analysis, and to have the specific factors made as large and 
important as possible is a heavy price to pay for having as 
few common factors as possible. 

That specific factors are a difficulty seems to be recog- 
nized by Thurstone. “The specific variance of a test,” he 
writes (Vectors, 68), “should be regarded as a challenge,” 
and he looks forward to splitting a specific factor up into 
group factors by brigading the test in question with new 
companion tests in a new battery. It seems clear that 
the dissolution of specifics into common factors is unlikely 
to happen if each analysis is conducted on the principle of 
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making the specific variances as large as possible.) We 
must, however, leave this point here, to return to it later. 

4. Sub-pools of the mind.—A difficulty which will occur 
to the reader in connexion with the sampling theory is that, 
when the correlation between two tests is large, it seems to 
imply that each needs nearly the whole mind to perform 
it (Spearman, 1928, 257). In our example the correlation 
between Tests 1 and 2 was ‘746, a correlation not infre- 
quently reached between actual tests. It is, for instance, 
almost exactly the correlation reported by Alexander 
between the Stanford-Binet test and the Otis Self- 
administering test (Alexander, 1935, Table XVI). (Does 
this, then, mean that each of these tests requires the 
activity of about four-sixths or five-sixths of all the 
“bonds” of the brain? Not necessarily, even on the 
sampling theory. These two tests are not so very unlike 
one another, and may fairly be described as sampling the 
same region of the mind rather than the whole mind, so 
that they may well include a rather large proportion of the 
bonds found in that region. They may be drawn, that is, 
from a sub-pool of the mind’s bonds rather than from the 
whole pool (Thomson, 1935b, 91; Bartlett, 19874, 102). 
Nor need the phrase “ region of the mind ” necessarily 
mean a topographical region, a part of the mind in the 
same sense as Yorkshire is part of England.) It may mean 
something, by analogy, more like the lowlands of England, 
all the land easily accessible to everybody, lying below, 
say, the 300-foot contour line. What the bonds ” of the 
mind are, we do not know. But they are fairly certainly 
associated with the neurones or nerve cells of our brains, 
of which there are probably round about ten thousand 
million in each normal brain. Thinking is accompanied 
by the excitation of these neurones in patterns. The 
simplest patterns are instinctive, more pas ones 

Intelligence is possibly associated with the 


number and complexity of the patterns which the brain 
can (or could) make. A “region of the mind in the 


be the domain of patterns below a 
the lowlands of England are below 


(Intelligence tests do not call upon 


above paragraph may 
certain complexity as 
a certain contour line. 


l 
] 
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brain patterns of a high degree of complexity, for these 
are always associated with acquired material and with the 
educational environment, and intelligence tests wish to 
avoid testing acquirement. It is not difficult to imagine 
that the items of the Stanford-Binet test call into some 
sort of activity nearly all the neurones of the brain, though 
they need not thereby be calling upon all the patterns 
which those neurones can form. When a teacher is 
demonstrating to an advanced class that “a quadratic 
form of rank 2 is identically equal to the product of 
two linear forms,” he is using patterns of a complexity far 
greater than any used in answering the Binet-Simon items. 


(But the neurones which form these patterns may not be 


more numerous. Those complicated patterns, however, 
are forbidden to the intelligence tester, for a very intelligent 
man may not have the ghost of an idea what a “ quadratic 
form” is. Within the limits of the comparatively simple 
patterns of the brain which they evoke, it seems very 
possible that the two tests in question call upon a large 
proportion of these, and have a large number in comma 

As has been indicated, the author is of opinion tha 
the way in which they magnify specific factors is the 
weak side of the theories of a few common factors. That 
does not mean, however, that a description of a matrix of 
correlations in terms of these theories is inexact. Men 
undoubtedly do perform mental tasks as if they were doing 
so by means of a comparatively small number of group 
factors of wide extent, and an enormous number of specific 
factors of very narrow range but of great importance each 
within its range. Whether a description of their powers in 
terms of the few common factors only is a good description 
depends in large measure on what purpose we want the 
description to subserve. The practical purpose is usually 
to give vocational or educational advice to the man or to 
his employers or teachers, and factors, though they cannot 
improve and indeed may blur the accuracy of vocational 
estimates, may, however, facilitate them where otherwise 
they would have been impossible, as money facilitates 
trade where barter is impossible. 

As a theoretical account of each man’s mind; however, 
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the theories which use the smallest number of common 
factors seem to have drawbacks. They can give an exact 
reproduction of the correlation coefficients. But, because 
of their large specific factors, they do not enable us to give 
an exact reproduction of each man’s scores in the original 
tests, so that much information is being lost by their 
use. 

It will be seen from considerations such as these that 
alternative analyses of a matrix of correlations, even 
although they may each reproduce the correlation coeffi- 
cients exactly, may not be equally acceptable on other 
grounds. The sampling theory, and the single general 
factor theory, can both describe exactly a hierarchical set 
of correlation coefficients, and they both give an explana- 
tion of why approximately hierarchical sets are found in 
practice. Ina mathematical sense, they are alternatives. 
But we cannot keep both as realities, though we may 
employ either mathematically. ) 

5. The inequality of men.—Professor Spearman opposed 
the sampling theory chiefly on the ground that it would 
make all correlations equal (and zero), and involve the 
further consequence that all men are equal in their average 


attainments (Abilities, 96), if the number of elementary | 


bonds is large, as the sampling theory requires. Both 
these objections, however, arise from a misunderstanding 
of the sampling theory, in which a sample means “ some 
but not all” of the elementary bonds (Thomson, 1985b, 
72, 76). As has been explained, tests can differ, on this 
theory, in their richness or complexity, and less rich tests 
will tend to have low, more complex tests will tend to have 
high correlations, at any rate if the “ bonds ” tend to be 
all-or-none in their nature, as the action of neurones is 
known to be. ‘And as for the assertion that the theory 


makes all men equal, there is no basis whatever for the 


suggestion that it assumes every man to have an equal 
chance of possessing every element or bond. On the con- 
trary, the sampling theory would consider men also to be 
samples, each man possessing some, but not all, both of the 
inherited and the acquired neural bonds which are the 
physical side of thought. Like the tests, some men are 
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rich, others poor, in these bonds. Some are richly endowed 
by heredity, some by opportunity and education; some 
by both, some by neither. \The idea that men are samples 
of all that might be, and that any task samples the powers 
which an individual man possesses, does not for a moment 
carry with it the consequences asserted of equal correlations 
and a humdrum mediocrity among human kind. 

6. Negative and positive correlations.*—The great major- 
ity of correlation coefficients reported in both biometric 
and psychological work are positive. This almost certainly 
represents an actual fact, namely that desirable qualities 
in mankind tend to be positively correlated ; for though 
reported correlations may be selected by the unconscious 
prejudices of experimenters, who are usually on the look- 
out for things which correlate positively, yet as those who 
have tried know, it is really very difficult to discover 
negative correlations between mental tests. Besides, even 
in imagination we cannot make a race of beings with 
predominantly negative correlations. A number of lists 
of the same persons in order of merit can be all very like 
one another, can indeed all be identical, but they cannot 
all be the opposite of one another. If Lists a and b are 
the inverse of one another, List c, if it is negatively 
correlated with a, will be positively correlated with b. 
Among a number n of variates, it is logically possible to 
have a square table of correlation coefficients each equal 
to unity ; that is, an average correlation of unity. But 
the farthest the average correlation can be pushed in the 
negative direction is —1/(n — 1). That is, if n is large, 
the average correlation can range from + 1 to only very 
little below zero. Even Mother Nature, then, by natural 
selection or by any other means, could not endow man 
with abilities which showed both many and large negative 
correlations. If they were many, they would have to be 
very small; if they were large, they would have to be 
very few. 

Natural selection has probably tended, on the whole, to 

* This section refers to correlations between tests. The greater 


frequency of negative correlations between persons has already been 
discussed in Chapter XVI, Section 8. 


THE SAMPLING THEORY 817 


favour positive correlations within the species.* In the case 
of some physical organs it is obvious that a high positive 
correlation is essential to survival value—for example, 
between right and left leg, or between legs and arms. In 
these cases of actual paired organs, however, it is doubtless 
more than a mere figure of speech to speak of a common 
factor as the cause. Between organs not simply related 
to one another, as say eyes and nose, natural selection, 
if it tended towards negative correlation, would probably 
split the genus or species into two, one relying mainly on 
eyesight, the other mainly on smell. Within the one 
species, since it is mathematically easier to make positive 
than negative correlations, it seems likely that the former 
would largely predominate. To say that this was due to 
a general factor would be to hypostatize a very complex 
and abstract cause. To use a general factor in giving a 
description of these variates is legitimate enough, but is, 
of course, nothing more than another way of saying that 
the correlations are mainly positive—if, as is the case, most 


* An important kind of natural selection is the selection of one sex 
by the other in mating. Dr. Bronson Price (1936) has pointed out 
that positive cross-correlation in parents will produce positive correla- 
tion in the offspring. Price further shows that this positive cross- 
correlation in the parents will result if the mating is highly homo- 
gamous for total or average goodness in the traits, a conclusion which, 
it may be remarked here, can be easily seen by using the pooling 
square described in our Chapter XIV. Price concludes: “ The 
intercorrelations which g has been presumed to illumine are seen 
primarily as consequences of the social and therefore marital 
importance which has attached to the abilities concerned.” Price 
in his argument makes use of formule from Sewall Wright (1921). 
M. S. Bartlett, in a note on Price's paper (Bartlett, 1987b), develops 
his argument more generally, also using Wright’s formule, and says: 
Price contrasts the idea of elementary genetic components with 
factor theories. It should, however, be pointed out that a 
statistical interpretation of such current theories can be and has been 
advocated. ‘Thomson has, for example, shown . - „ and here 
follows a brief outline of the sampling theory. “ On the basis of 
Thomson's theory,” Bartlett adds, I have pointed out (Bartlett, 
1987a) that general and specific abilities may naturally be defined 
in terms of these components, and that while some statistical 
interpretation of these major factors seems almost inevitable, this 
may not in itself render their conception invalid or useless.” 
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people mean by a general factor one which helps in every 
case, not an interference factor which sometimes helps and 
sometimes hinders. 

7. Low reduced rank.—It is, however, on the tendency 
to a low reduced rank in matrices of mental correlations 
that the theory of factors is mainly built. It has very 
much impressed people to find that mental correlations 
can be so closely imitated by a fairly small number of 
common factors. Ignoring the host of large specific factors 
to which this view commits them, they have concluded 
that the agreement was so remarkable that there must be 
something in it. There is; but it is almost the opposite of 
what they think. Instead of showing that the mind has 
a definite structure, being composed of a few factors which 
work through innumerable specific machines, the low rank 
shows that the mind has hardly any structure. If the 
early belief that the reduced rank was in all cases one had 
been confirmed, that would indeed have shown that the 
mind had no structure at all but was completely undiffer- 
entiated. It is the departures from rank 1 which indicate 
structure, and it is a significant fact that a general tendency 
is noticeable in experimental reports to the effect that 
batteries do not permit of being explained by as small a 
number of factors in adults as in children, probably because 
in adults education and vocation have imposed a structure 
on the mind which is absent in the young. 

By saying that the mind has little structure, nothing 
derogatory is meant. The mind of man, and his brain, too, 
are marvellous and wonderful. All that is meant by the 
absence of structure is the absence of any fixed or strong 
linkages among the elements (if the word may for a moment 
be used without implications) of the mind, so that any 
sample whatever of those elements or components can be 
assembled in the activity called for by a “ test.” 

Not that there is any necessity to suppose that the mind 
is composed of separate and atomic elements. It is pos- 
sibly a continuum, its elements if any being more like 
the molecules of a dissolved crystalline substance than 
like grains of sand. The only reason for using the word 
“ elements ” is that it is difficult, if not impossible, to speak 
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of the different parts of the mind without assuming some 
“ items ” in terms of which to think. For concreteness it 
is convenient to identify the elements, on the mental side, 
with something of the nature of Thorndike’s bonds,“ 
and on the bodily side with neurone arcs ; in the remainder 
of this chapter the word “ bonds ” will be used. But 
there is no necessity beyond that of convenience and 
vividness in this. The “bonds” spoken of may be 
identified by different readers with different entities. All 
a “ bond ” means, is some very simple aspect of the causal 
background. Some of them may be inherited, some may 
be due to education. There is no implication that the 
combined action of a number of them is the mere sum of 
their separate actions. There is no commitment to 
“ mental atomism.” 

If, now, we have a causal background comprising in- 
numerable bonds, and if any measurement we make can 
be influenced by any sample of that background, one 
measurement by this sample and another by that, all 
samples being possible ; and if we choose a number of 
different measurements and find their intercorrelations, 
the matrix of these intercorrelations will tend to be 
hierarchical, or at least tend to have a low reduced rank. 
This has nothing to do with.the mind: it is simply a 
mathematical necessity, whatever the material used to 
illustrate it. 

8. A mind with only six bonds.—We shall illustrate this 
fact first by imagining a “mind” which can form only 
six “ bonds,” which mind we submit to four „tests“ 
which are of different degrees of richness, the one requiring 
the joint action of five bonds, the others of four, three, and 
two respectively (Thomson, 1927b). These four tests will 
(when we give them to a number of such minds) yield 
correlations with one another. For we shall suppose the 
different minds not all to be able to form all six of the 
possible bonds, some individuals possessing all six, others 
possessing smaller numbers. 

We have only specified the richness of each test, but 
have not said which bonds form each ability. There may, 
therefore, be different degrees of overlap between them, 


820 THE FACTORIAL ANALYSIS OF HUMAN ABILITY 


though some will be more frequent than others if we form 
all the possible sets of four tests which are of richness five, 
four, three, and two. If we call the bonds a, b, c, d, e, 


and f, then one possible pattern of overlap would be the 
following : 
Test | Bonds 
1 | a b 0 d e 
2 i b 0 d e 
8 DES f 8 d of 
3 f c d 


If we for further simplicity suppose these bonds to be 
equally important, and use the formula— 
overlap 
geometrical mean of the two totals 


we can calculate the correlations which these four tests 
would give, namely : 


Correlation = 


1 2 3 4 
1 4 2 2 
v20 V/15 4/10 
5 4 2 2 
s 
2 2 1 
8 — — — 
V15 4/12 6 
7 2 2 1 


vio V8 v6 

and we notice that in this particular pattern all three 
tetrad-differences are zero. However, if we picked our 
four tests at random (taking care only that they were of 
these degrees of richness) we would not always or often get 
the above pattern : in point of fact, we would get it only 
12 times in 450. Nevertheless, it is one of the most prob- 
able patterns. In all, 78 different patterns of the bonds 
are possible—always adhering to our five, four, three, and 
two—the probability of each pattern ranging from 12 in 
450 down to 1 in 450. 
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It is possible to calculate the tetrad-differences for each 
one of the 78 possible patterns of overlap which can occur. 
When we then multiply each pattern by the expected fre- 
quency of its occurrence in 450 random choices of the four 
tests, we get 450 values for each tetrad-difference, distri- 
buted as follows : 


Values of | 


Frequency of 
F x VI20 F, F, Fs 


8 | | 2 
7 | 2 0 
6 8 | 14 

9 2 6 


0 99 | 54 | 81 
Sn 56 | 78 | 86 
—2 ey 42 42 
= 16 | 30 | 60 
a 30 | 86 | 18 
Sag OOP sO 
—6 4 12 e 8 


450 | 450 | 450 


Although the distribution of each F about zero is slightly 
irregular, the average value of each F is exactly zero. For 
Vi the variance is— 

2,164 
EE 

120 x 450 

We see, then, that in this universe of very primitive- 
minded men, whose brains can form only six bonds, four 
tests which demanded respectively five, four, three, and 
two bonds would give tetrad-differences whose expected 
value would be zero, the values actually found being 
grouped around zero with a certain variance. There is no 
particular mystery about the four “ richnesses five, four, 
three, and two, by the way. We might have taken any 


b. A. —11 


g? 
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“ richnesses ” and got a similar result. If there are no 
linkages among the bonds, the most probable value of a 
tetrad-difference will always be zero; and if all possible 
combinations of the bonds are taken, the average of all the 
tetrad-differences will be zero. With only six bonds in the 
“ mind,” however, the scatter on both sides of zero will be 
considerable, as the above value of the standard deviation 
of F, shows, viz.— 
o = 1/-040 = -20 


9. A mind with twelve bonds.—But as the number: of 
bonds in the mind inereases, the tetrad-differences crowd 
closer and closer to zero. Let us, for example, suppose 
exactly the same experiment as above conducted in a 
universe of men whose minds could form twelve bonds 
(instead of six), the four tests requiring ten, eight, six, and 
four of these (instead of five, four, three, and two) (Thom- 
son, 19270). This increase in complexity enormously 
increases the work of calculating all the possible patterns 
of overlap, and the frequency of each. There are now 
1,257 different square tables of correlation coefficients and 
still more patterns of overlap, some of which, however, 
give the same correlations. When each possibility is taken 
in its proper relative frequency (ranging from once to 
11,520 times) there are no fewer than 1,078,110 instances 
required to represent the distribution. They have, 
nevertheless, all been calculated, and the distribution of 
F, was as follows : 


91520 1920 | Viozo | 1920 
F, Freq. | 35 Freq. | 1 re Freq. | 5 28 Freq. 
20 225 7 17,760 | 3 31,432 — 13 624 
18 1.800 6 74,892 | — 4 72,676 — 14 3,792 
16 1,755 5 15,744 — 5 33,808 — 15 | 4,144 
15 4.600 4 52,085 7 6 40,328 — 16 | 3,970 
14 3,840 3 121,608 — 7 21,240 — 18 112 
12 19,610 2 42,384 — 8 41,951 — 19 456 
11 10,632 1 28,096 | —9 5,896 — 20 | 584 
10 8,360 © 122,699 — 10 29,184 —24 28 
9 /26,696/—1 | 63,024 11 | 8,960 | | 
8 37,738 2 81,208 | — 12 15,672 | 


Total 1,078,110 


F 
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This table again gives an average value of Fi exactly 
equal to zero. But the separate values of the tetrad- 
difference are grouped more closely round zero than 
before, with a variance now given by— 

g? = — 37,166,400 0018 
1,920 X 1,078,110 

This is rather less than half the previous variance. 
Doubling the number of bonds in the imagined mind has 
halved the variance of the tetrad-differences. If we were 
to inerease the number of potential bonds supposed to 
exist in the mind to anything like what must be its true 
figure, we would clearly reach a point where the tetrad- 
differences would be grouped round zero very closely 
indeed. 

The principle illustrated by the above concrete example 
can be examined by general algebraic means, and the above 
suggested conclusion fully confirmed (Mackie, 19284, 
1929). It is found that the variance of the tetrad-differ- 
ences sinks in proportion. to 1/(N — 1), where N is the 
number of bonds, when N becomes large, and the above 
example agrees with this even for such small N’s as 6 and 
12: for— 

6—1 
x 040 = 018 as found. 
12— 1 

In this mathematical treatment, bonds have been spoken 

of as though they were separate atoms of the mind, and, 


moreover, were all equally important. It is probably 
ke the former assumption, which 


quite unnecessary to ma 
may or may not agree with the actual facts of the mind, 
or of the brain. Suitable mathematical treatment could 
probably be devised to examine the case where the causal 
background is, as it were, & continuum, different proportions 
of it forming tests of different degrees of richness. And as 
for the second assumption, it is in all likelihood merely 
formal. Let the continuum be divided into parts of equal 
importance, and then the number of these increased and 
their extent reduced, keeping their importance equal. 
What is necessary, to give the result that zero tetrads 555 
so highly probable, is that it be possible to take our tes 
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with equal ease from any part of the causal background ; that 
there be no linkages among the bonds which will disturb the 
random frequency of the various possible combinations ; 
in other words, that there be no “ faculties” in the mind, 
And it is also necessary that all possible tests be taken in 
their probable frequency. 

In any actual experiment, of course, it is quite imprac- 
ticable to take all possible tests, which are indeed infinite 
in number. A sample of tests is taken. If this sample 
is large and random, then there should, in a mind without 
separate “ faculties,” without linkages between its bonds, 
be an approach to zero tetrads. The fact that this ten- 
dency attracted Professor Spearman’s attention, and was 
sufficiently strong to make him at first believe that all 
samples of tests showed it, provided care was taken to 
avoid tests so alike as to be almost duplicates (which 
would be “ statistical impossibilities ” in a random sample), 
indicates that the mind is indeed very free to use its bonds 
in any combination, that they are comparatively unlinked. 

The sampling theory assumes that each ability is com- 
posed of some but not all of the bonds, and that abilities 
can differ very markedly in their “ richness,” some needing 
very many “ bonds,” some only few. It further requires 
some approach to all-or-none ” reaction in the“ bonds ” ; 
that is, it supposes that a bond tends either not to come 
into the pattern at all, or to do so with its full force. This 
does not seem a very unnatural assumption to make. It 
would be fulfilled if a “ bond ” had a threshold below which 
it did not act, but above which it did act ; and this property 
is said to characterize neurone ares and patterns. When 
this form of sampling is assumed the rank of the correlation 
matrix tends to be reducible to a small number, if all 
possible correlations are taken, and finally to be one as the 
bonds increase without limit. 

It is important to realize what is meant by the rank 
tending to rank 1 as more and more of the possible corre- 
lations are taken. When the rank is 1 the tetrad- 
differences are zero. But clearly, the reader may say, 
taking more and more samples of the bonds to form more 
and more tests will not change in any way the pre-existing 
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tetrad-differences, will not make them zero if they are not 
zero to start with. That is perfectly true; but that is not 
what is meant. As more and more tests are formed by 
samples of the bonds, the number of zero and very small 
tetrads will inerease and swamp the large tetrads. The 
sampling theory does not say that all tetrads will be 
exactly zero, or the rank exactly 1. It says that the 
tetrads will be distributed about zero (not because each 
is taken both plus and minus, but when all are given their 
sign by the same rule) with a scatter which can be reduced 
without limit, in the sense that with more bonds the pro- 
portion of large tetrads becomes smaller and smaller ; 
always provided all possible samples are taken, i.e. that 
the family of correlation coefficients is complete. 

With a finite number of tests this, of course, is not the 
case; but if the tests are a random sample of all possible z 
tests, there will again be the approach to zero tetrads. 
The same will be true if the tests are sampling not the whole 
mind, but some portion of it, some sub-pool of our mind’s 
abilities. If we stray from this pool and fish in other 
waters, we shall break the hierarchy; but if we sampled 
the whole pool of a mind, we should again find the tendency 
to hierarchical order. If the mind is organized into sub- 
pools (such as the verbal sub-pool, say), then we shall be 
liable to fish in two or three of them, and get a rank of 
2 or 8 in our matrix, i.e. get two or three common factors, 


in the language of the other theory. 

10. Contrast with physical measurements. — The tendency 
for tetrad-differences to be closely grouped around zero 
appears to be stronger in mental measurements than else- 


where; stronger, for example, than in physical measure- 
ments although it is found there too. Te 

In physical measurements we do not measure a person's 
body just from anywhere to anywhere. We observe organs 
and measure them—leg, cranium, chest girth, ete. The 
variates are not a random sample. In other words, the 


physical body has an obvious structure which guides our 
measurements, and the tendency to a low rank among the 
correlation coefficient, although present, is less than among 
mental measurements. The tendency to zero tetrad- 
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differences in the mind is due to the fact that the mind 


has, comparatively speaking, no organs. We can, and do, 
measure it almost from anywhere to anywhere. No test 
measures a leg or an arm of the mind; every test calls 
upon a group of the mind’s bonds which intermingles in 
most complicated ways with the groups needed for other 
tests, without being a set pattern immutably linked into 
an organ. Of all the conceivable combinations of the 
bonds of the mind we can, without great difficulty, take a 
random sample, whereas in physical measurements we take 
only the sample forced on us by the organs of the body. 
Being free to measure the mind almost from anywhere to 
anywhere, we can get a set of measurements which show 
“hierarchical order“ without overgreat trouble. We can 
do so because the mind is so comparatively structureless. 
Mental measurements tend to show hierarchical order, and 
to be susceptible of mathematical description in terms of 
one general factor or few, and innumerable specifies, not 


-because there are specific neural machines through which 


its energy must show itself, but just exactly because there 
are no fixed neural machines. The mind is capable of 
expressing itself in the most plastic and Protean way, 
especially before education, language, the subjects of 
the school curriculum, the occupation, and the political 
beliefs of adult life have imposed a habitual structure on 
it. It is not without significance that the “ factor“ most 
widely recognized after Spearman’s g is the verbal factor v, 
the mother-tongue being, as it were, the physical body of 
the mind, its acquired structure. 

II. Absolute variance of different tests It will be noted 
that on the sampling theory the different tests will natur- 
ally have different variances, the “ richer ” tests having a 
wider scatter. This seems only natural. It is customary, 
at any rate in theoretical discussions, to reduce all scores 
in different tests to standard measure, thereby equal- 
izing their variance. This seems inevitable, for there 
is no means of comparing the scatter of marks in two 
different tests. But it does not follow that the scatter 
would be really the same if some means of comparison 
were available. When the same test is given to two 
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different groups we have no hesitation in ascribing a wider 
variance to the one or the other group, and it seems con- 
ceivable that a similar distinction might mentally be made 
between the scores made by one group in two different 
tests. The writer is completely in accord with M. S. Bart- 
lett when he says (Bartlett, 1935, 205): “ I think many : 
people would agree. that the variation in mathematical 
ability displayed even in a selected group such as Cam- 
bridge Tripos candidates cannot be altogether put down 
to the method of marking adopted by the examiners.” 
We may put these mathematics marks into standard 
measure, and we may put the marks scored by the same 
group in, say, & form-board test, also into standard measure. 
But that does not imply that at bottom the two variances 
are equal, if only we had some rigorous way of comparing 
them. Our common sense tells us plainly that they are 
not equal in the absolute sense, though for many purposes 
their difference is irrelevant. It seems to be no defect, 
then, but rather a good quality, of the sampling theory 
to involve different absolute variances. 

12. A. distinction between g and other common factors. — 
The writer is inclined to make a distinction in interpretation 
between the Spearman general factor g and the various 
other common factors, mostly if not all of less extent than 
g, which have been suggested. When properly measured 
by a wide and varied hierarchical battery, g appears to him 
to be an index of the span of the whole mind, other common 
factors to measure only sub-pools, linkages among bonds. 
The former measures the whole number of bonds; the 
latter indicate the degree of structure among them. 


x ; x - a 
Some of this “ structure > is no doubt innate ; but mor 
i ucation and 


ol it is probably due to environment and education ane 
life. Its expression in terms of separate uncorrela e 
a e case, that 


factors suggests what is almost certain] not th 15 
the sub-pools are separate from one another. The 


actual organization is likely to be much more com) licated 


than that, and its categories to be interlaced and inter 
woven, like the relationships of men in & community, 
i smokers, 


plumbers and Methodists, blonds, bachelors, 1. Aang 
Conservatives, illiterates, native-born, criminals, & 
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school-teachers, an organization into classes which cut 
across one another right and left. 


Further, it is improbable that the organization of each 


6) 


mind is the samé. The phrase “factors of the mind” 
suggests too strongly that this is so, and that minds differ 
only in the amount of each factor they possess. It is more 
than likely that different minds perform any task or test by 
different means, and indeed that the same mind does so at 
different times. 

Yet with all the dangers and imperfections which attend 
it, it is probable that the factor theory will go on, and will 
serve to advance the science of psychology. For one thing, 
it is far too interesting to cease to have students and 
adherents. There is a strong natural desire in mankind 
to imagine or create, and to name, forces and powers 
behind the façade of what is observed, nor can any excep- 
tion be taken to this if the hypotheses which emerge 
explain the phenomena as far as they go, and are a guide 
to further inquiry. That the factor theory has been a 
guide and a spur to many investigators cannot be denied, 
and it is probably here that it finds its chief justification. 


| 


CHAPTER XXI 
SOME FUNDAMENTAL QUESTIONS 


Ir seems advisable to conclude with a brief discussion of 
some of the fundamental theoretical questions needing an 
answer. Among these are the following, of which (1) 
and (8) are rather liable to be forgotten by those actually 
engaged in making factorial analyses : ` 

(1) What metric or system of units is to be used in 
factorial analysis ? 

(2) On what principle are we to decide where to stop the 
rotation of our factor-axes or how to choose them so that 
rotation is unnecessary ? 

(3) Is the principle of minimizing the number of 
common factors, i.e. of analysing only the communal 
variance, to be retained ? 

(4) Are oblique, i.e. correlated factors to be permitted ? 

1. Metric Most of the work done in factorial analysis 
has assumed the scores of the tests to be standardized ; 
that is to say, in each test the unit of measure has been 
the actual standard deviation found in the distribution. 
This is in a sense a confession of ignorance. The accidental 
standard deviation which happens to result from the par- 
ticular form of scoring used in a test means, of course, 
nothing more. Yet there is undoubtedly something to be 
said for the probability of real differences of standard 
deviation existing between tests (see Chapter XX, 
Section 11). In that case, if we knew these real standard 
deviations, we would use variances and covariances and 
analyse them, not correlations (compare Hotelling, 1933, 


421-2 and 509-10). 
Burt has urged the use of variances and covariances, 
le his relation 


which are indeed necessary to him to enab 
between trait factors and person factors to hold (see Chap- 
ter XVII, page 264). But the variances and covariances 
he actually uses are simply the arbitrary ones which arise 
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from the raw scores, and depend entirely upon the scoring 
system used in each test. It would seem necessary to 
have some system of rational, not arbitrary, units. . 

Hotelling has already suggested one such, based upon 
the idea of the principal components of all possible tests, 
but it would seem to be unattainable in practice (Hotel- 
ling, 1933, 510). Another can be based on the ideas of the 
sampling theory and has already been foreshadowed in 
the previous chapter. Tests quite naturally have different 
variances on that theory, since they comprise larger or 
smaller samples of the “ bonds ” of the mind (see Thomson, 
1935b, 87). In a hierarchical battery these natural 
variances are measured by the “ coefficient of richness.” 
The “ richness ” of Test k is given by 

Ty 

the same quantity as the square of Spearman’s “ satura- 
tion with g.“ It is, on the sampling theory, the fraction 
which the test forms of the pool of bonds which is being 
sampled, and is the natural variance of the test in compari- 
son with other tests from that pool. The “ saturation 
with g” of Spearman’s theory is the “ natural standard 
deviation“ of the sampling theory. Even in a battery 
which is not hierarchical, the formula (Chapter II, 


Section 5, page 43)— 
A 
T — 2A 


will give a rough estimate of the natural standard deviation 
of each test. The general principle is that tests which 
show the most total correlation have the largest natural 
variance, 

2. Rotation.—Our views on the rotation of factors will 
depend on what we want them to do. Burt looks upon 
them as merely a convenient form of classification and is 
content to take the principal axes of the ellipsoids of density, 
or that approximation to them given by a good centroid 
analysis, as they stand, without any rotation. He “ takes 
out” the first centroid factor, either by calculation or 
by selecting a very special group of persons each of whom 
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has in a battery of tests an average score equal to the 
population average, cach of the tests also having the same 
average as every other test in the battery over this sub- 
group of persons (Burt, 1938a). He concentrates attention 
on the remaining factors, which are “ bipolar,” having 
both positive and negative weights in the tests, When, 
as in the article referred to, he is analysing temperaments, 
this fits in well with common names for emotional charac- 
teristics, for those names too are usually bipolar, as 
brave-cowardly, extravagant-stingy, extravert-introvert, 
and so on. 

Thurstone, on the other hand, emphatically insists on 
the need for rotation if the factors are to have psycho» 
logical meaning (Thurstone, 19384, 90). The centroid 
factors are mere averages of the tests which happen to 
form the battery, and change as tests are added or taken 
away, whereas he wants factors which are invariant from 
battery to battery. I think he would put invariance 
before psychological meaning, and say that if a certain 
factor keeps turning up in battery after battery we must 
ask ourselves what its psychological meaning is. His 
own opinion, backed up by a great deal of experimental 
work of a pioneering and exploratory nature, is that his 
principle of rotating to “ simple structure“ gives us also 
psychologically meaningful and invariant factors. 

The problems of rotation and metric are not unconnected, 
and one piece of evidence in favour of rotating to simple 
structure is that the latter is independent of the units 
used in the tests. If instead of analysing correlations we 
analyse covariances, with whatever beg ore deviations 
we care to assign to the tests, we get a 
quite different ye the centroid analysis of correlations. 
But if we rotate each to simple structure the tables are 
identical, except, of course, that in the covariance structure 


cach row is multiplied by the » 
test. 

For example, if we take the 4 testo of Chapter XI 
Section 2 (page 152) and aseri 
deviations of 1, 2, 3, 4 we can replace the 
correlations and comm 
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ance-communalities, and perform a centroid analysis. 
Since we know the proper communalities* it comes out 
exactly in three factors with no residues, and gives the 
centroid structure : 


2 II III 
1 | -872 -567 -462 
2 -948 1-278 — -060 
8 | 1969 —1-016 — -887 
4 | 1002 1072 2.118 
5 | 2-992 598 1-716 
6 | 3-879 2403 “B87 


When this is rotated to simple structure, by post- 
multiplication by the matrix 


802 389 -453 
—.5922 416 691 
080 —:822 5064 


the resulting table is: 


4 B C 
1 . ° 820 
2 8 950 1-278 
3 2-154 619 
4 ` 2:577 
5 2-187 . 2:732 
6 4:213 . . 


This is identical with the simple structure found from 
the correlations, if the rows here are divided by 1, 2, 8, 4, 
_ 5, and 6, the standard deviations. It is definitely a point 
in favour of simple structure that it is thus independent 
of the system of units employed. Spearman’s analysis of 
a hierarchical matrix into one g and specifies also has this 


* If we have to guess communalities, our two simple structures 
will differ slightly because the highest covariance in a column may 
not correspond to the highest correlation. But with a battery of 
many tests this difference will be unimportant, and could be 
annulled by iteration. 
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property of independence of the metric. If the tetrad- 
differences of a matrix of correlations are zero, and we 
analyse into one general factor and specifies, it is immaterial 
whether we analyse correlations or covariances. The 
loadings obtained in the latter case are exactly the same 
except, of course, that each is multiplied by the appropriate 
standard deviation. 

At this point one is reminded of Lawley’s loadings* 
found by the method of maximum likelihood, for these 
possess the property that the unrotated loadings obtained 
from correlations are already the same as the unrotated 
loadings obtained from covariances, if the latter are 
divided by the standard deviations. Centroid analyses, 
or principal component analyses, do not possess this 
property. The loadings obtained by these means from 
covariances cannot be simply divided by the standard 
deviations to give the loadings derived from correlations, 
though the one can be rotated into the other. Lawley’s 
loadings need no such rotation. They are, as it were, at 
once of the same shape whether from covariances or from 
correlations and only need an adjustment of units, such as 
one makes in changing, say, from yards to feet. A field 
which is 50 yards broad and 20 poles long has the same 
shape as one which is 150 feet broad and 330 feet long. 

Now, as we have seen, this property of equivalence of 
covariance and correlation loadings is also possessed by 
simple structure. It would thus not be unnatural to hope 
that Lawley’s method might lead straight to simple 
structure, without any rotation. But this is not the case. 
Clearly, then, simple structure is not the only position of 
the axes where the loadings are independent of the units of 
measurement employed. Indeed, any subsequent post- 


* In accordance with our definition on page 170, the term “ load- 
ing” means a coefficient in a specification equation, an 1 R 3 
“ pattern,” In the present chapter it is used lana o 8 es 
strictly correct when the axes referred to are a ait 1 5 TA 
axes are oblique, then much of what is said really refers to 10 2 
in a structure, not in a pattern : but the word “ loading is : u : 
to avoid circumlocutions, and because the structure of the re! 1 8 
vectors is, except for a diagonal matrix multiplier, identical wit 


pattern of the factors. 
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multiplication of both the simple structure tables—both 
that from correlations and that from covariances—by the 
same orthogonal rotating matrix will leave their equivalence 
with regard to units unharmed. Simple structure is only 
one of an infinite number of positions which possess this 
property. But it is an easily identifiable one. 

It is difficult to keep one’s mind clear as to the meaning 
of this. Let me recapitulate. There are some processes 
of analysis which, while they give a perfect analysis in the 
sense of one which reproduces the correlations (or the co- 
variances) exactly, do not give the same analysis for the 
correlations as for the covariances. The factors they 
arrive at depend upon the units of measurement employed 
in the tests. Such, for example, are the principal compon- 
ents process and the centroid process. Such processes 
cannot be relied on to give straight away and without 
rotation, factors which can be called objective and scien- 
tific. Some processes, on the other hand, do give analyses 
which are independent of the units. One such is Lawley’s, 
based on maximum likelihood. Another is Thurstone’s 
simple-structure process, which, though it begins by using 
a centroid analysis, follows this by rotation of a certain kind. 

But the principle of independence of units does not 
distinguish between these processes, which both satisfy it. 
Still less does it distinguish between systems of factors. 
For any one of the infinite number of such systems which 
can be got from either simple structure or Lawley’s factors 
by rotation equally satisfies the principle. Indeed, there 
can really be no talk of a system of factors satisfying the 
principle. Any table of loadings whatever, obtained from 
correlations, has, of course, corresponding to it a system 
differing only in that the rows are multiplied by coefficients, 
a system which would correspond with covariances. 
The fact that no one has discovered a process which gives 
both is irrelevant. The argument is rather as follows. If 
a worker believes that he has found a process which gives 
the true psychological factors, then that process must be 
independent of the metric, and simple structure and 
maximum likelihood are both thus independent, though 
they do not, alas, agree. Nor must it be forgotten that 


SOME FUNDAMENTAL QUESTIONS 335 


analyses from correlations are in no way superior to those 
from covariances. Indeed, correlations are covariances, 
dependent upon as arbitrary a choice of units—namely 
standard deviations—as any other. But centroid axes 
in themselves, or principal components, without rotation, 
are clearly inadmissible, for they change with the units 
used. The chance that such axes are the true ones is 
infinitesimal, being dependent on the chance composition 
of the battery, and the system of units which chances to 
be used. Independence of metrie is not sufficient to 
validate a process but it is necessary. Its absence does 
not prove a system of factors to be wrong, but it makes it 
certain that the process by which they have been arrived 
at does not in general give the true factors. l 
3. Specifies. — These form a fundamental problem in 
factorial analysis and yet they are practically never heard 
of in discussions of an analysis. It is reasonable enough to 
think that a test may require some trick of the intellect 
peculiar to itself, yet it is not obvious that these specific 
factors must be made as large and important as possible ; 
and that is what the plan of minimizing the rank of a 
matrix does. The excess of factors over tests which 
inevitably, of course, results from postulating a specific in 
every test, means that the factors cannot be estimated with 
any great accuracy. Usually the accuracy is very low 
indeed. The determinate and the indeterminate parts of 
each of Thurstone’s factors in Primary Mental Abilities can 
be found by post-multiplying Table 7 on his page 98 by 
Table 8 on his page 96. We find : 


Variance of the Variance of the 


Factor Estimated Part Indeterminate Part 
cee T, 611 389 
P 616 384. 
N +825 175 
V 662 338 
My uk v ASL 509 
W. ee 561 
1 397 603 
R 600 -400 
D -519 481 
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The average for the nine factors is only 56} per cent. of 
the variance estimated. In other words the factor 
estimates have large probable errors in some cases as large 
as the estimates themselves. This has serious conse- 
quences, not to be overcome by more reliable tests. 

Using unity for every diagonal element in the matrix of 
such a battery will give factors (supposing the same 
number of them to be taken out) which will not imitate the 
correlations quite so well, but which can be estimated 
accurately. 

In fact, whether Hotelling’s process or the centroid 
process is used, with unit communalities, each factor can 
be calculated exactly for a man, given his scores. By 
exactly we mean that they are as accurate as his scores are. 
Of course, in any psychological experiment the scores may 
not be accurate in the sense that they can be exactly 
reproduced by a repetition of the experiment. Apart from 
sheer blunders and clerical errors, there is the fact that a 
man’s performance fluctuates from day to day. But 
these errors are common to any process of calculation 
which may be used on the scores. These are not the errors 
for which we are criticizing estimates of a man’s factors. 
The point we are making is that factors based on com- 
munalities less than unity have a further, and large, error 
of estimation, whereas factors based on unit communalities 
(even if only one or two or a few are taken out) have no 
such further error of estimation. 

If a few such factors taken out with unit communalities 
are then rotated (keeping them in the same space, i.e not 
changing their number) they still remain susceptible of 
exact estimation in a man. 

As soon, however, as any fractions, minimum or not, are 
placed in the diagonal cells, we have thereby decided to 
use, in describing our tests, more orthogonal axes than there 
are tests ; for each test has then a specific factor, and there 
are in addition the common factors. This means in terms 
of our spatial model that none of the axes, neither the 
common factors nor the specific factors, are in the test 
space at all (except at the origin where they all cross). It 
is only about the test space, of dimensions equal to the 


—— ee a a 
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number of tests, that we have any information from our 
battery. These axes are away in outer darkness and we 
cannot know them, but only their projections or shadows 
on the test space. Psychologists invariably confine their 
attention, after making an analysis using communalities, 
to the “ common factor space,” of a comparatively small 
number of dimensions, without, I think, being usually 
aware that this space is not in the test space at all. (Thur- 
stone’s “ secondary factors,” in their turn, are not even in 
the common factor space, for he uses what I might call 
secondary communalities.) The effect of all this is that the 
factors arrived at by an analysis which has begun by 
placing fractions in the diagonal cells can never be measured 
in any man, but only vaguely estimated, and with maxi- 
mum vagueness if minimum communalities are used. 

In itself the fact that factors can only be estimated and 
of course, not fatal. Through- 
out statistical work runs the idea of estimation in a realm 
outside that which is experimentally known, in a realm of 
more dimensions than that in which our measurements 
have been made. Itis to allow for that that the device of 
degrees of freedom“ is used in the analysis of variance. 
But in factorial analysis the vagueness due to estimation 
is deliberately maximized, for reducing the rank of a matrix 
of correlations involves the simultaneous maximizing of 
the specific variances. In Section 8 of the previous chapter 
a brief reference was made to this fact that methods of 


factorizing which use communalities maximize the variance 
mizing the number 


of the specific factors, by reason of mini 195 
of common factors. First take the case of the analysis of a 


hierarchical battery. As Was illustrated in Chapter XX 


the analysis of such a battery into one general factor only, 
variance possible to the 


and specifies, gives the maximum vark 1 8 
specifies. The combined communalities of the tests ar 
in any other analysis. 


less in the two-factor analysis than 1 
The mathematical expression of this is that the trace of the 


reduced correlation matrix, i.e. the sum of the cells of the 
principal diagonal, is a minimum. 5 

Tt is true that certain exceptions to this seten 
mathematically possible, but their occurrence in actu 


not accurately measured is, 
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psychological work is a practical impossibility. They have 
been investigated by Ledermann (Ledermann, 1940), who 
finds, in the case of the hierarchical matrix, that an excep- 
tion is only possible when one of the g saturations is greater 
than the sum of all the others. When the battery is of 
any size, this is most unlikely to occur: and almost always, 
when it did occur, the large saturation of one test would 
turn out to be greater than unity, which is not permissible 
(the Heywood case). 

The same statement as the above, that the specifics are 
maximized, is also true in general. The communalities 
which give the matrix its lowest rank are in sum less than 
any other diagonal elements permissible. If smaller 
numbers are placed in the diagonal cells, the analysis fails 
unless factors with a loading of 4/ — 1 are employed, and 
such factors are, of course, inadmissible. 

Here again there are possibly cases where the lowest 
rank is not accompanied by the lowest trace (i.e. the lowest 
sum of the communalities). But here again it seems cer- 
tain that if such cases do exist, they are mathematical 
curiosities which would never occur in practice. 

If specifie factors of such large size have any psycho- 
logical existence, what can they be? Possibilities which 
will occur to us are first, that they are error factors—but 
errors or variations in the subject’s performance are not 
likely to be entirely unique to one test. Secondly, they 
have been attributed to sampling errors in the coefficients 
of correlation—but these sampling errors are themselves 
correlated, and so give rise to false common factors, not to 
specific factors. Thirdly, they may be real mental factors, 
unique to that test, needed only by it. But what remark- 
able consequences follow if we accept that. I devise a 
brand-new test and, lo, in the mind of man there exists a 
specific ability to do that test and, moreover, an ability 
which is useless in every other activity. Further, every 
individual I meet possesses this specifie ability in large 
or small amount. The idea in this form is really fan- 
tastic. 

It would seem then that the specifics cannot be really 
unique, but only unique in this battery, This leads to the 


* n a 
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presumption that the tests of a battery possess specific 
factors only because there does not happen to be in the 
battery any other test to share the specific, or at least part 
of it, and prove it to be really one or more common factors. 
On this view, specifies will disappear when a test has been 
tried in a large number of batteries, or in a sufficiently large 
battery. Not only does this seem unlikely when one 
considers that in every battery the minimum communalities 
and maximum specifics are insisted on, but it also has 
peculiar consequences in regard to the number of primary 
factors. Consider a battery consisting of, say, two dozen 
tests, analysed into, say, seven common factors plus, of 
course, two dozen specifics. The latter, it must be re- 
membered, are all orthogonal, all uncorrelated with one 
another. On the hypothesis that they are really factors 
which just do not happen to have found a partner, like 
wallflowers at a ball, there must exist at least two dozen 
other primary factors waiting to be discovered in a larger 
battery. And so with every pattery of tests. The number 
of primary factors must be larger than all the tests hitherto 
invented, which does not seem to be parsimonious. I can- 
not help fearing that there is something wrong with the idea 
of reducing the matrix of correlation coefficients or Co- 
variances to its lowest possible rank, and then calling the 
descriptive variates to which this leads “ factors of the 
mind“: something wrong with the whole idea of attri- 


buting as much as possible of the variance of a test to a 


‘ 5 ” 
unique factor, something wrong with the ‘ parsimony 
argument upon which all this is based. It leads 199 5 
many difficulties to which it is possible, but not, I thin 
advisable, to shut one’s eyes- Moreover, the reciprocity 
principle, which identifies factors and loadings obtained 


from correlating tests with loadings and factors oranes 
from correlating persons, works only when there 1 

specifics involved. I would like to see a number of 15 ing 
squares of correlation coefficients re-analysed wee, 10 
variance in each diagonal cell and the results consi po 5 
There would be no guessing of the communalities, an pe 
repetitions or iterations of the calculation to determi 


them. Tests of significance of residues would be more 
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easily made, and although rather more factors would be 
necessary before the residues became insignificant, they 
would have the advantage of more accurate estimation in 
any man. True, such factors would be confined to the 
particular test space of that battery, and admittedly a 
factor of the mind is not likely to be an exact composite of 
the tests of any one battery. But the point is an academic 
one, for the common-factor space in which communality 
factors exist, is just as much a creation of the particular 
battery as are axes determined within the bat tery 
space. 

I must not be misunderstood as saying that no specific 
factors exist at all. What I am sceptical about is the pro- 
cedure of making the specific factors in every battery as 
large as possible, by the automatic application of a mathe- 
matical device. That every test may well have some 
unique quality for any individual person seems conceivable, 
though I do not think this special feature of the test will 
be felt as a peculiarity by every person who tries the test. 
I think any such unique quality would be a blemish in the 
test, just as unreliability is a blemish, and that the psych- 
ologist should endeavour to make tests which are neither 
unreliable nor burdened with unique peculiarities. Prob- 
ably he cannot avoid a certain amount of uniqueness, just 
as he cannot avoid a certain amount of unreliabilit y. But 
I do not see the need for ascribing maximum uniqueness in 
order to reduce the number of common factors. 

A critic may point out that, if even small truly unique 
parts of the tests are admittedly present, there will always 
be the need for the large number of specifics. Possibly so 
—but specifics of no great importance, if the tests are 
good ones; specifies with an influence as unimportant as 
the causes are of the residuals which we in any case ignore 
after statistical testing. 

It is true that by the use of communalities the total 
number of loadings to be estimated is reduced to a mini- 
mum. That way of putting the parsimony argument 
would be perhaps defensible. What I doubt is whether too 
high a price is not paid, since this same procedure maxi- 
mizes the specifies, and decides their importance without 
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any psychological consideration whatever being given to 
the question. i 

The practical conclusions I would draw from these con- 
siderations about the nature of specific factors are that a 
battery used for factorial analysis should be composed of 
tests of high communality in that battery: or that, if 
tests are admitted which by the mathematical principle 
of rank reduction are allotted low communalities, the 
psychologist should agree that these tests do draw, each 
of them, upon factors of the mind not represented elsewhere 
in the battery. 

Such is the argument against minimum communalities. 
For them is the hope that some day, despite their draw- 
backs, the factors they lead to may prove to be something 
real, perhaps haye some physiological basis. And their 
defender may plead that the estimates of these factors are 
as good as the estimates we find useful, in predicting 
educational or occupational efficiency. g 

4. Oblique factors. —1 think it is pretty certain that 
Thurstone took to oblique factors because he wants simple 
structure at all costs. Certainly oblique factors make it 
much easier to reach simple structure too easy, Reyburn 
and Taylor say. It will be found far more often than it 
really exists, they add. On the other hand, Thurstone 


can point to his box example and his trapezium example 


ith that simple structure enabled him 


and say with tru > ` aple struc- 
to find “ realities,” can say that the oblique simple stru 


ture is something more real, in the ordinary common g, 
everyday use of the word, than the orthogonal second- 
order factors which are an alternative. 

Other workers, not at a 
structure, have also declare 
e.g. Raymond Cattell, and, I think, many W 
to work in terms of « clusters. 
and height are both measures of some 


they -are correlated. We could analyse 
uncorrelated factors d and b, or into three for that matter, 


but certainly no one would use these in ie ak 1 
is, however, just conceivable that some pair of 5 RE 
(say) might be found which corresponded, not one 0 
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to height and one to weight, but one to orthogonal factor 
a and another to orthogonal factor b. It is far too early 
to state anything more than a preference for orthogonal 
or oblique factors. Opinion is turning, I think, toward 
the acceptance of the latter. 
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The sampling of bonds. 


1. Textbooks on matrix algebra.—Some knowledge of 
matrix algebra is assumed, such as can be gained from the 
mathematical introduction to L. L. Thurstone's Multiple 
Factor Analysis (Chicago, 1947); Turnbull and Aitken's 
Theory of Canonical Matrices, Chapter I (London and 
Glasgow, 1932); H. W. Turnbull’s The T heory of Deter- 
minants, Matrices, and Invariants, Chapters I-V (London 
and Glasgow, 1929); and M. Bôcher’s Introduction to 
Higher Algebra, Chapters II, V, and VI (New York, 1936). 

I have adopted Thurstone’s notation in Sections 19 
and 19a of the mathematical appendix, and in Chapters 
XI, XII, and XIII in describing his work. But I have not 
made the change elsewhere because readers would then be 
incommoded in consulting my own former papers. 


The chief differences are as follows: f 
My M is Thurstone’s F, for centroid factors, my Z is 


Thurstone’s S + VN, and my F is Thurstone's P + 
345 
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2. Matrix notation—Let X be the matrix of raw scores 
of p persons in n tests, with n rows and p columns ; and 
when normalized by rows, let it be denoted by Z. The 
letters z and Z in the teat of this book mean standardized 
scores, which are used in practical work, but in this 
appendix they mean normalized scores, so that 


22 R a. | 


the matrix of correlations between n tests. 

For many purposes it is convenient to think of solid 
matrices like Z as column (or row) vectors of which each 
element represents a row (or column). Thus Z can be 
thought of as a column vector z, of which each element 
represents in a collapsed form a row of test scores. Thus 
with three tests and four persons— 


— — — 
21 Zu 2 213 214 
; 
=| an in % % =Z . (2) 
23 231 232 233 234 


In the theory of mental factors each score is represented 
as a loaded sum of the normalized factors f, the loadings 
being different for each test, i.e.— 


z = Mf (specification equations) . (3) 


where M is the matrix of loadings and f the vector of v 
factors, collapsed into a column from F, the full matrix,- 
of dimensions v X p. 


We note that p = number of persons, 
n = number of tests, 
v = number of factors. 
The dimensions of M aren X v. Equation (3) represents 
n simultaneous equations, and the form Z = MF represents 
np simultaneous equations. 


We now have— 
R = ZE. =(MF)(MF)’ = MFF'M' — A 
If the factors are orthogonal, we have— 
KASI HS Se eee (0) 


MATHEMATICAL APPENDIX 347 


the unit matrix, and therefore 


RÆMM 5 3 1 (6) 
The resemblance in shape between this and— 
RAAE : 5 „ e 


leads to a parallelism between formulæ concerning persons 
and factors (Thomson, 19350, 75 ; Mackie, 1928a, 74, and 
1929, 34). 

3. Spearman’s Theory of Two Factors assumes that M 
is of the special form— 


L m 
M = E „le ＋ M = 1 070 
1, Ma 
s 2 
and therefore 
R = U +M? >- 5 eG) 


where M, is the diagonal matrix which forms the right- 
hand end of M, and lis the first column of M. In this 
form it is clear that R is of rank 1 except for its principal 
diagonal. Its component Ul’ is the “ reduced correlational 
matrix ” of the Spearman case, and is entirely of rank ils 
The elements 1%, lp’, . . In’ which form the principal 
diagonal of W, are called communalities.” 
4. Multiple common factors.—When more than one 
common factor is present, M takes the form— 
M = (% M) : « (9) 


where M, is the matrix of loadings of the common factors, 
se by the simple column J. 


represented in the Spearman ca 
We have then— 

R = MM' = MMi + U ERA . (10) 
where the “reduced correlation matrix“ M, Mo is of 
rank r, the number of common factors, and is identical 
with R except for having « communalities in its principal 


diagonal. : 
5. Orthogonal rotations.—If we express the v factors f in 
terms of w new factors ꝙ by the equation— 


FF. ee? ae 
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where A is a matrix of v rows and w columns, we have— 
z = Mf = MA» 3 , . (2) 
an expression of the tests z as linear loaded sums of a 
different set of factors, with a matrix of loadings MA. 
If— 
44 =I : A ` ` . (18) 


the new factors ọ are orthogonal like the old ones. They 
can be as numerous as we like, but not less than the number 
of tests unless the matrix R is singular. (12) represents a 
rigid rotation of the orthogonal axes f into new positions, 
with dimensions added or abolished. 

6. The sampling theory. The following transformation 
is of interest as showing the connexion between the 
Theory of Two Factors and the Sampling Theory (Thom- 
son, 19350, 85). We shall write it out for three tests only, 
but it is quite general. Consider the orthogonal matrix: 


wi mil Iml um mm mim imm mmm 


mli -I mm mim — Im Um mmm imm 
Im mm -u imm — ml mmm — lm —mim 


llm | mim imm. 11 mmm ml — Im —mml (14) 
— =.. ieee ote es ae See — 1 ———— 5 
mm -im mil mmm í ul imm — mim um 
mim! -m mmm — mul imm ül mm Iml 
imm! mmm —lim Im mim —mml ul; mll 
eet Fe 11 ————— -- a = [i 
mmm imm mim mm Um Imi mll; —lil 


wherein the omitted subscripts 1, 2, and 3 are to be 
understood as existing always in that order, so that mil 
means ela. 

If we take for A in Equation (12) the first four rows 
of this orthogonal matrix, and for M the Spearman form 
(7) with three tests, the result is to transfer to eight new 
factors, yielding : 

z = bhp, + Mhp + lmp, + mmp, 


22 = Lhe, + Mhapa + LMP, + MMPs . . (15) 
Z3 = bhp, + Mhapa + hmp, + mm ps 


Each z is here in normalized units. If, however, we 
change to new units by multiplying the three equations 
by h, l, and l; respectively, we have: 


=o 


r 
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Lz = Labor + hmp, + Lila + Lingmsy, 
lz; = hlp, + MylalaP2 + Llampa + mim, . (16) 
liza = Lilalapi + Millpo + male + MMalsPs 


and the variates L121, l% and 1,2, are now susceptible of 
the explanation that each is composed of 1,2 small equal 
components drawn at random from a pool of N such 
components, all-or-none in nature. In that case II LaLa N 
components would probably appear in all three drawings 
(pi); Hm components would probably appear in the 
first two drawings, but not in the third (pa); and so on 
down to m, m, components, which would not appear 
at all (ps, which is missing from the equations). 

The transformation can, of course, be reversed, and the 
sampling theory equations converted into the two-factor 
equations. 

7. Hotelling’s “ principal components ” are the principal 
axes of the ellipsoids of equal density— 
z'R™'z = constant 9 „ (Gl) 


when the test vectors are orthogonal axes (Hotelling, 1933). 
To find the principal axes involves finding the latent 
roots of R71. The Hotelling process consists of (a) a 
rotation of the axes from the orthogonal text axes to the 
directions of the principal axes ; and (b) a set of strains 


and stresses along these new axes to standardize the factors, 


making the ellipsoid spherical and the original axes oblique. 
The transformation from the tests to the Hotelling factors 
y being from Equation (3)— 


z = My (M square) 
the ellipsoids (17) become— 
constant = 2 R = ME AI = „ 8) 
since they become spheres. Therefore we must have— 
M'R”M =I . g „ OK) 
The locus of the mid points of chords of 2 R whose 


direction cosines are h’ is the plane „Kia = 0, and NES 
is a principal plane it is at right angles to the chords i 


bisects, i.e.— 


WR) = D 
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which has non-trivial solutions only for— 
| RA— M| =0 

the roots à of which are the “latent roots“ of El, while 
each h’ is a “ latent vector.” 

Now, if H is the matrix of normalized latent vectors of 
RI, we have 

HR IH = A 
where A is the diagonal matrix of the latent roots of R 
so that a solution for M corresponding to rotation to the 
principal axes and subsequent change of units to give a 
sphere is seen to be— 
M = HA : . (20) 

The latent vectors of R are the same as those of Nl, 
or of any power of R, and Hotelling’s process described 
in the text (Chapter VII) finds the latent roots (forming the 
diagonal matrix D) and the latent vectors (forming H) of 
R. We then have— 

M= HD ., : . (21) 

For the convergence of the process, see Hotelling’s paper 
of 1933, pages 14 and 15. 

Since in Hotelling analyses M is square, we can write— 


y¥ = ME = (HD?) z 
= DH's = DI DH)) = DM'z . (22) 
Each factor y, that is, can be found from a column of 
the matrix M, divided by the corresponding latent root, 
used as loadings of the test scores z. 
8. The pooling square. If the matrix of correlations of 
a + b variates is: 
Rag Ro 


Rig | Ry, 
and if the standardized variates a are multiplied by weights 
u, the standardized variates b by weights w, and each set 
of scores summed to make two composite scores, the 
resulting variances and covariances are : 


(28) 


wR,,u | w Rw 


(24) 


w'R,,u | w'R,,w 
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as can be seen by writing out the latter expressions at 


length. The battery intercorrelation is therefore — 
1 N, or wRyuu 
= 5 ; 25 


N 10 Ryo) 
If weights are applied to vate scores, each applied weight 
must be multiplied by each pre-existing standard deviation, 


in (25). 
If there is only one variate in the a team, (25) becomes— 
20% 
„ 3 
y (w Ryw) 


where 5% represents a whole column of correlation coeffi- 
cients. The values of w for which this reaches its maximum , 


value will satisfy the equation— 


E . 
Fro Ii Neo) 
that is 
w = a scalar X Rø Ta . 28) 


consistent with the ordinary method of deducing regression 


coefficients. ; 
9. The regression equation.—If % 1s the one variate in 


the a team, and z are the b team, and if — 


80 = We : (29) 
we wish to make S(z — 80)“ 4 minimum, that is— 
è 
— S(zy — 508) = 0 
FA ) 
Szo! = wo’ Sas 
70% =r N, —1 
‘ab **bb G0) 


20 = Tw Rw 2 
Tf R is the matrix of correlations of all the tests including 
zo the regression estimate of any one of the tests from a 
weighted sum of the others is given by— 
determinant R, = 0 
where R, is R with the row corresponding 


i d by the row of variates. i 
to be estimated replaced by 5 1 


9a. Relations between two sets 15 
19354, 1936, M. S. Bartlett 1948). If two sets of variates 
have correlation coefficients 


. (81) 
to the variate 
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Sor 2 | 
Re) RS E 

and if the variates of the B team are fitted with weights b, 

then the correlations of the B team, thus weighted, with 

the separate tests of the A team are given by 


C 
r= = 
Vb'Bb 
and the square of the correlation coefficient between the 
two teams is then— 
CAC 
b'Bb 
The maximum intercorrelation, and other points of in- 
flexion in A, will be given by— 
db = 0 
00 . .. (818) 


a set of homogeneous equations in b. We must therefore 
have— 


. (81.1) 


ine i: 5 (31.2 


CA B| =0 . „661 
an equation for à with as many non-zero roots as the num- 
ber of variates in the smaller team. For any one of these 
roots à, the weights b are proportional to the co-factors of 
any row of (CA — 5). The corresponding weights 
a for the A team are then found by condensing the team B 
(using weights b) to a single variate and carrying out an 
ordinary regression calculation. 

The result is to “ factorize ” each team into as many 
orthogonal axes as there are variates. These axes are re- 
lated to one another in pairs corresponding to the roots à. 
Each axis is orthogonal to all the others except its own 
opposite number in the space of the other team, arising 
from the same root J as it does, to which axis it is inclined 
at an angle arccos V).. Where one team has m more 
variates than the other, m of the roots will be zeros and 
the corresponding axes will be at right angles to the whole 
space of the other team. This form of factorizing has been 
called by M. S. Bartlett (1948) external factorizing, since 
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the position of the “ factors or orthogonal axes in each 
team, in each space, is dictated by the other team. 

The weightings corresponding to the largest root give 
the closest possible correlation of the two weighted teams. 
If the two teams are duplicate forms of the same tests, this 
is the maximum attainable battery or team reliability 
(Thomson 1940, 1947, 1948). In this case Peel (Nature, 
1947) has shown that a simpler equation than 831-4 gives 
the required roots. If 4 =p" Peel’s equation is— 

|\C—pa|=0 . : (31.5) 
where A differs from C only in the diagonal elements, which 
in A are unities but in C are reliabilities u of the individual 
tests. 

Green (1950) gives a transformation of this equation 
which enables Hotelling’s iterative process (see Chapter 
VII) to be used to find u, the maximum battery reliability. 
For the diagonal elements fi; — . of the matrix (C — pA), 
Green writes— 


[; E Ta E le —7,)(1 — u í 


when 31-5 becomes equivalent to— 

DD = HH = (31.0) 
wherein D is a diagonal matrix with elements 1 rai, 
I is the unit matrix, and b = /-). The latent vector 
V corresponding to the largest latent root of DCD can then 
be found by Hotelling’s process, and the best weights for 
maximum battery reliability are proportional to DV = W. 
The maximum reliability thus attained is— 

u = WCW/ Ww AW 

10. ‘Regression estimates of factors—When m the speci- 
fications— 

„ ea (3) 

the factors outnumber the tests, they cannot be E r 

but only estimated. To all men with the same set of 

scores will be attributed the same set of estimated eps 

f, though their “ true „ factors may be different. A 

regression method of estimation minimizes the squares 0 


F.A.—12 
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the discrepancies between f and f, summed over the men. 
The regression equation (31) will be for one factor . 


3 
e 
where m; is a column of M. Expanding, we have 
J. = mR 
and in general 
F = HRE : . (88) 
or, separating the common factors and the specifics— 
fo= MR: . 3 . (84) 
I = MIR 2 5 . (85) 


the latter of which shows that we know the proportionate 
weights for each specific (the rows of R~?) even before we 
know whether that specific exists (Wilson, 1934, 194). 
The matrix of covariances of the estimated factors is— 

E- M,'R-"™, | 


7 she 
n © A 
= = 
a square idempotent matrix of order equal to the number 
of factors, but trace only equal to the number of tests. 

For one common factor, (34) reduces to Spearman’s 
estimate— 


2 1 Tig; 
= — 2 . . (84 
„ Ce 
7. 2 
where SS 4 
B= Ti 


while K = M,'R-'M, in (36) reduces to S/(1 + S), the 
variance of g. 

10a. Ledermann’s short cut (1988a, 1939b).—The above 
requires the calculation of the reciprocal of the large square 
matrix R. Ledermann’s short cut only requires the reci- 
procal of a matrix of order equal to the number of common, 
factors. As long as the factors are orthogonal we have— 

R=M,M,'+M; . : . (10) 
and the identity 
M,'M,~*(M,M,’ Ta M?) m (M; M, 0 Fe I)M,' 
= (J + I)M,' say. 
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Premultiplying by (I + J) and postmultiplying by R! 


we reach (I + J) `M; M, > = MIR: 5 . (86.1) 
and the left-hand quantity can then be used in Equation 
(34). 


This short cut requires modification when the factors are 
oblique. See Equations (70.1) to (70.4) below. 
11. Direct and indirect vocational advice.—If 20 is an 
occupation and z a battery of tests, the estimate of a 

candidate’s occupational ability is— 


4% E oly aaa te 


where the ry are the correlations of the occupation with the 
tests. If 20 can be specified in terms of the common 
factors of z, and a specific 30 independent of z, then an 
indirect estimate of 20 via the estimated fy is possible. We 
have— 

20 = mfo tso - é . (38) 
where m is a row of occupation loadings for the common 
factors fo of 2, and also— 

fo =MoR'2 

Substitution in (88), assuming an average 80 (= 0) 
gives 
he mo M N22 5 . (39) 
But— 

440 % eee ee (40) 
and (89) is identical with (37) (Thomson, 1936a). If, how- 
ever, sọ is not independent of the specifies s of the battery, 
(40) will not hold, and the estimate (39) made via an estima- 
tion of the factors will not agree with the correct estimate 


37). y , 
i a Computation methods. —The “ Doolittle ” method of 


i i i is wi used in America 
computing regression coefficients 1s widely 
1 1 d, used and 


(Holzinger, 19374, 32). Aitken’s metho u a 
explained in the text, is in the present author’s opinion 


superior (Aitken, 19374 and b, with earlier reference?) 
Regression calculations and many others are all specia 
cases of the evaluation of a triple matrix product X VERIM 
where Y is square and non-singular, and X and Z may 


356 MATHEMATICAL APPENDIX 


be rectangular. The Aitken method writes these matrices 
down in the form— 

Y — 4 

a 


and applies pivotal condensation until all entries to the 
left of the vertical line are cleared off. All pivots must 
originate from elements of Y. By giving X and Z special 
values (including the unit matrix J) the most varied 
operations can be brought under the one scheme. 

13. Bartlett's estimate of factors. We have z = Mofo + 
M,f,, where fọ and fı are column vectors of the common 
and specific factors respectively and M, is a diagonal 
matrix. Bartlett now makes the estimates f, such as will 
minimize the sum of the squares of each person’s specifies 
over the battery of tests, 8 e.— 


or 60 iS =0 
i.e.— 
(Mi- He)“ (M,™z — MI Maſo) = 0 
MyM 2 = MO MI Ma fy 
= Jfo, say 
fo =J M M; z. (41) 
(Bartlett, 1937a, 100.) 
One could also find the estimated specifics as— 
H= (I -MMJ MMM, is. (42) 
Substituting— 


z [MO M H 
[ 0 ıl Ji 
we get for the relation between f and f— 


e l-a be 


and for the covariances of F we get— 


Aa ices] 00 
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The error variances and covariances of the common 
factors are— 
(fo — fo fo — fo)’ =I" Mo My () May" Mod 
=J M Mi MM . (45) 
(Bartlett, 1987a, 100.) 
When there is only one common factor, J becomes the 
familiar quantity— 
EE 
1 
(Bartlett, 1935, 200.) 
As was first noted by Ledermann *— 
1+ J7 = (ho) =s (46) 
(quoted by Thomson, 19384); and using this we see that 
the back estimates of the original scores from the regression 
estimates fọ are identical with the insertion of Bartlett's 
estimates fy in the common-factor part of the specification 
equations, viz.— 
M,K MOR 12 = M MM2 - . (47) 
(Thomson, 19384.) 
Bartlett has pointed out that, using the same identity, in 
the form K = J(I — K), it is easy to establish the rever- 
sible relation between his estimates and regression esti- 
mates— 
k . (48) 


i= Ko, fo = K e 
(Bartlett, 1938) 


and he summarizes their different interpretation and prop- 


erties by the formulæ 
EL} = Ei = o, EX(fo =o po = : SK (49) 
EKR) =f Eil( fe Jo Sf} =" 
lish to Bas Ki- K) . (50) 
where E denotes averaging over all persons, By over all 
possible sets of tests (comparable with the given set in 
regard to the amount of information on the group 


factors fo). 
14. Indeterminacy.- 
the factors outnumber t 
* Letter of Octobe 


—The fact that estimated factors, if 
he tests, necessarily have less than 


r 23, 1937, to Thomson. 
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unit variance has sometimes been expressed in the case of 
one common factor by postulating an indeterminate 
vector i whose variance completes unity. This i may be 
regarded as the usual error of estimation, and is a function 
of the specific abilities (Thomson, 1934, B. J. P., 25, 92). That 
MHR in Equation (86) is of rank less than its order also 
expresses the indeterminacy, and allows the factors to be 
rotated to different positions which nevertheless fulfil all 
the required conditions. In the hierarchical case the 
transformation which effects this is (Thomson, 1935a)— 


f= Bọ : A . (51) 
where B means the required number of rows of — 

B =I — 299%“ % x . (52) 
in which— 

qi = l/m; (see Equation 7) . (58) 


as far as there exist tests, after which q is arbitrary. 

For— 

z = Mf = MBo = Mo 
since 
MB=M. ; . (54) 

and z is thus expressed by identical specification equations 
in terms of new factors ꝙ. For such transformations in the 
case of multiple factors see Thomson, 1936a, 40; and 
Ledermann, 1938c. 

If the matrix M is divided into the part M, due to 
common factors and the part M, due to specifics, as in 
Equation (9), then Ledermann shows that if U is any 


orthogonal matrix of order equal to the number of com- 
mon factors; the matrix— 


wherein— Q= 8 da 
1 Mo 


will satisfy the equation— 
MB =M 


Indeterminacy is entirely due to the excess of factors 
over tests, i.e. to the fact that the matrix of loadings M 
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is not square. It can be in theory abolished by adding 
a new test which contains no new factor, not even a new 
specific; or a set of new tests which add fewer factors 
than their number, so that M becomes square (Thomson, 
1934b ; 1935a, 253). In the case of a hierarchy each of 
these tests singly will conform to the hierarchy, so that 
their saturations l can be found; but jointly they break 
the hierarchy. If they add no new factors, g can then be 
found without any indeterminacy. 

15. Finding g saturations from an imperfectly hierarchical 
battery—The Spearman formula given in Chapter III, 
Section 5, is the most usual method. A discussion of other 
methods will be found in Burt (1936, 288-7). See also 
Thomson (1984a, 370), for an iterative process modified 
from Hotelling. 

16. Sampling errors of tetrad-differences—The formule 
(16) and (164) given in the text are both approximations, 
but appear to be very good approximations. The primary 
papers are Spearman and Holzinger (1924 and 1925). 
Critical examination of the formule have been made by 
Pearson and Moul (1927), and Pearson, Jeffery, and Elder- 
ton (1929). Wishart (1928) has considered a quantity E 
which is equal to P'N°/(N — 1)(N 2), where P is the 
tetrad-difference of the covariances a instead of the correla- 
tions, and obtained an ewact expression for the standard 


deviation c of P— 


(N — 205 = NSD Da D aD Da . (55) 


where the D’s are determi 
and its quadrants : 


nants of the following matrix 


Gs 14 


1 
| ay te | 
| ay G | Gog, ta 
Dee si 
| dg Gss | Oss za 
| @g 4% 
4 


re necessary when the 


imate assumptions a 
Bat ao ae p tetrad-difference of the 


standard deviation of the ordinary 
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correlations is deduced from that of P. The result for 
the variance of the tetrad-difference is— 


N+1 
(N i) N 2) 


where R is the 4 x 4 determinant of the correlations. 

17. Selection from a multivariate normal population.— 
The primary papers are those of Karl Pearson (1902 and 
1912). The matrix form given in the text (Chapter XIX, 
Section 2) is due to Aitken (1934), who employed Soper’s 
device of the moment-generating function, and made a 
free use of the notation and methods of matrices. A 
variant of it which is sometimes useful has been given by 
Ledermann (Thomson and Ledermann, 1938) as follows. 
If the original matrix is subdivided in any symmetrical 
manner : 


(N — ra) — R - (56) 


: PP es 
2 N F 
3 


and R,, is changed by selection to V/, then each resulting 
sub-matrix, including V,, itself, is given by the formula 


Vip = Rig N, „E., Rpg } 5 
13 — itl (57) 
where pp = R, — R, „R, 


17d. Maximum likelihood estimation. The maximum 
likelihood equations for estimating factor loadings (Lawley, 
1940, 1941, 19435) may be expressed fairly simply in the 
notation of previous sections. It is necessary, however, 
to distinguish between the matrix of observed correla- 
tions, which we shall denote by Ro, and the matrix 


R = MM,’ +M; 


which represents that part of Ro which is “ explained ” by 
the factors. 


The equations may then be written— 
aR aes i (88) 
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These are not very suitable for computational work. 
It may, however, be shown that— A : 
MoR- = (I K) Mi- = (F e M (59) 
where, as before, 
K = M. NM, J =M; MM, 
Hence our equations may be transformed into the 
form— 
Mi =U JIM MI RO . (60) 
or alternatively, 
My = J (M My 0 — Mi’) . (61) 
When there are two or more general factors the above 
equations will have an infinite number of solutions corre- 
sponding to all the possible rotations of the factor axes. 
A unique solution may, however, be found such as to 
make J a diagonal matrix. 
Finally, if we put— 
L = M; MI. RO — Mo 
V = LM; Mo, 
then, fròm the last set of equations 
V IM. Mi- Mo = J. 


Hence we have 
Mi = VIL 5 5 : . (62) 


These equations have been found the most convenient in 


practice, since they can be solved by an iterative process. 
and M, have been ob- 


When first approximations to Mo been 
tained, they can be used to provide second approximations 


by substitution in the right-hand side. 


18. Reciprocity of loadings and factor ; 
traits (Burt, 1937b).—Let W be a matrix of scores centred 


both by rows and columns. Its dimensions are Be X 
persons (. p), and 5 8 aller a 
both ¢ and p in consequence of the double centring. 55 
two matrices of covariances are ww’ for traits and w 

for persons, and by @ theorem first enunciated by e 
in 1883 (independently discovered by Burt), their 118 

latent roots are the same. Jf their dimensions i 15 
i.e. t + p, the larger one will have additional zero roots. 


b. A.—12* 


s in persons and 
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Let the non-zero roots form the diagonal matrix D. Then 
the principal axes analyses are : 

HID, dimensions (f. r)(r.r)(r.p) ~ 
and W’= HD, dimensions (p . Y) . r)(r . t) 
where H, and H, are the latent vectors of WW’ and W’W, 
while F, is the matrix of factors possessed by persons, 
F, that of factors possessed by traits. From the analysis 
of W we have, taking the transpose— 

W'= F DH’, dimensions (p . r)(r . r)(r . t) 
and comparison of this with the former expression for W’ 
makes the reciprocity of H, and Ji, F, and Hi“, evident. 

19. Oblique factors. Structure and pattern—In Thur- 
stone’s notation, which we shall follow in this paragraph, 
the matrix M of our equation (8), when it refers to centroid 
factors, is called F. Our equation (3) becomes in his 
notation— 

s = Fp 
Since centroid factors are orthogonal, F is both a pattern 
and a structure. The structure is the matrix of correla- 
tions between tests and factors, i.e. : 
Structure = sp’ = (Fp)p' = F(pp') = FI = F = Pattern. 

When the factors are oblique, however, this is not the 
case. In that case, Structure = Pattern x matrix of 
correlations between the factors. 

Thurstone turns the centroid factors to a new set of 
‘positions (still within the common-factor space, and in 
general oblique to one another) called reference vectors. 
The rotating matrix is A, and 

MV= FA ; $ . (68) 
is the structure on the reference vectors. The cosines’ of 
the angles between the reference vectors are given by A'A. 
Vis not a pattern. Its rows cannot be used as coefficients 
in equations specifying a man’s scores in the tests, given 
his scores in the reference vectors. The pattern on the 
reference vectors would not have those zeros which are 
found in V. 

The primary factors are the lines of intersection of the 
hyperplanes which are at right angles to the reference 
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vectors, taken (r — 1) at a time where r is the number of 
common factors, the number of dimensions in the common- 
factor space. They are defined, therefore, by the equations 
of the hyperplanes, taken (r — 1) at a time. These 


equations are K AO (64) 


where a is a column vector of co-ordinates along the 
centroid axes. The direction cosines of the intersections 
of these hyperplanes taken (r — 1) at a time are therefore 
proportional to the elements in the columns of (AD ees 
and to make them into direction cosines this has to have 
its columns normalized by post-multiplication by a diagonal 
matrix D, giving for the structure on the primary factors 
F(A’) D 0 3 . (65) 
D is also the matrix of correlations between the reference 
vectors and the primary factors, for 
A(A) 7D =D . . (66) 
is therefore correlated with its own 


Each primary factor 
reference vector but orthogonal to all the others, as can 


also be easily seen geometrically. : 
The matrix of intercorrelations of the primary factors 


is DA7(A‘)7'D from equation (65). 
If W is the pattern on the primary fa 
test scores s = Wp 
then the structure on the primary factors 1s also 
sp’ = Wop’ 
of correlations between the primary 


ctors p, so that 


where pp’ is the matrix 
factors, and therefore 


primary factor structure = DAN) ‘ . (67) 
Also, this structure = F(A’) D from (65). 
Equating these we have : 
WDA =f 050 
—1 8 
whence N * as F 75 : (69) 
We have, therefore, 
3 Structure 11155 
Reference vectors - : N 10 5 D 1 * oe (70) 


Primary factors : 
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where the reference-vector pattern has been entered 
by analogy but could easily be independently found. 
It will be seen that the structure and pattern of the 
primary factors are identical with the pattern and struc- 
ture of the reference vectors except for the diagonal 
matrix D. The structure of the one is the pattern of the 
other multiplied by D. 

This theorem is not confined to the case of simple 
structure, but is more general, and applies to any two sets 
of oblique axes with the same origin O, of which the axes 
of the one set are intersections of“ primes“ taken r — 1 
at a time in the space of r dimensions, and the axes of the 
other set are lines perpendicular to those primes. By 
-prime is meant a space of one dimension less than the whole, 
ie. Thurstone’s hyperplane. The projections of any point 
P on to the one set of axes are identical with the projections 
thereon of its oblique co-ordinates on the other set, which 
sentence is equivalent to the matrix identities (see 70)— 

FA =FAD™ x D 
and F(A) D = F(A')“! x D 
Structure | Pattern on 8 to project it 
on one set} other set } | on to the first set. 


A diagram makes this obvious in the two-dimensional case 
and gives the key to the situation. A perspective diagram 
of the three-dimensional case is not very difficult to make 
and is still more illuminating. The vector (or test) OP 
is the “ resultant ” of its oblique co-ordinates (the pattern), 
but not of its projections (the structure). It is of interest 
to notice that, either on the reference vectors or on the 
primary factors— 

Pattern x Transpose of Structure = Test-correlations. 
This serves as a useful check on calculations. It is geo- 
metrically immediately obvious. For consider a space 
defined by n oblique axes, with origin O, and any two 
points P and Q each at unit distance from O. The direc- 
tions OP and OQ may be taken as vectors corresponding 
to two tests, and cos POQ to the test correlation. 

Consider the pattern, on these axes, of OP, and the 
structure, on the same axes, of OQ. The former is com- 
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posed of the oblique co-ordinates of the point P, the latter 
of the projections on the axes of the point Q, which pro- 
jections (OQ being unity) are cosines. Then ‘the inner 
product of those oblique co-ordinates of P with these cosines 
obviously adds up to the projection of OP on OQ, that is 
to cos POQ, or the correlation coefficient. 

In estimating oblique factors by regression, since the 
correlations between factors and tests must be used, the 
relevant equation is 

fo = {F(A DYR „ eiii, 
Ledermann’s short cut (section 10a above) requires consider- 
able modification for oblique factors. We no longer have 
R=M.M, +My . : „ ee 
but 
Pattern x transpose of structure + My? = R 
i.e. in Thurstone’s notation 


(FAD THEA ED + Fi; = K . (70.2) 
and using this (Thomson, 1949), we reach the equation 
5 1 + JF (A) DEE (7048) 


where now 
J = {F (A) DYF (EAD) 3 (70.4) 
in place of Ledermann’s J = M NI 0. 

Only reciprocals of matrices of order equal to the 
number of common factors are now required, but the 
calculation, like all concerning oblique factors, 18 still one 
of considerable labour. 

19a. Second-order factors.—The above primary factors 
can themselves in their turn be factorized into one, two, or 
more second-order factors, and a factor-specific for each 
primary. If the rank of the matrix of intercorrelations 
of the primaries can be reduced by diagonal entries to sy 
two, then the r primaries will be replaced by r42 secon f 
order factors Which will no longer be in the origina 
common- factor space. The correlations of the e 
with these second-order factors will form an oblong a rix 
with its first two columns filled, but cach ae ae 
column will have only one entry corresponding to a ‘actor 


specific, thus: 
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r r r . 

7 ? r 

r ? r . = E (say), 
r r r $ 

r r r 


where subscripts must be supplied to indicate the primary 
(the row) and the second-order factor (the column). 

The primary factors can be thought of as added to the 
actual tests, their direction cosines being added as rows 
below F, which thus becomes : 


- F 
DA 


Imagine this matrix post-multiplied by a rotating matrix 
WY, with r rows and r + 2 columns, which will give the 
correlations with the r+ 2 second-order factors. The 
lower part of the resulting matrix will be Æ, which we 
already know. That is— 
DAN], =E 8 : 3 (3!) 
Y= ADIE . : che (T2) 
and the correlations of the original tests with the second- 
order factors are then : 
G=FY¥ =FAD"E = DE . (73) 
G is both a structure and a pattern, with continuous 
columns equal in number to the general second-order 
factors, followed by a number of columns equal to the 
number of primaries, this second part forming an orthog- 
onal simple structure. 

20. Boundary conditions.—These refer to the conditions 
under which a matrix of correlation coefficients can be 
explained by orthogonal factors which run each through 
only a given number of tests. The problem was first 
raised by Thomson (1919b) and a beginning made with 
its solution (J. R. Thompson, Appendix to Thomson’s 
paper). Various papers by J. R. Thompson culminated 
in that of 1929, and see also Black (1929). Thomson 
returned to the problem in connexion with rotations in the 
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common-factor space (Thomson, 19360), and Ledermann 
gave rigorous proofs of the theorems enunciated by 
Thomson and Thompson and extended them (Ledermann, 
1936). A necessary condition is that if the largest latent 
root of the matrix of correlations exceeds the integer s, 
then factors which run through s tests only and have zero 
loadings in the other tests are certainly inadequate. This 
rule has not been proved to be sufficient, and when applied 
to the common-factor space only it is certainly not suf- 
ficient, though it seems to be a good guide. Ledermann 
(1936, 170-4) has given a stringent condition as follows. 
If we define the nullity of a square matrix as order minus 
rank, then if it is to be possible to factorize orthogonally a 
matrix of R rank r in such a way that the matrix of load- 
ings contains at least 7 zeros in each of its columns, the 
sum of the nullities of all the r-rowed principal minors of 
R must at‘least be equal to r. 

21. The sampling of bonds.—The root idea is that of the 
complete family of variates that can be made by all possible 
additive combinations of bonds from a given pool, and 


the complete family of, correlation coefficients between 


pairs of these. Thomson (1927)) mooted the idea and 
Chapter XX. He 


worked out the example quoted in 
had earlier (1927a) showed that with all-or-none bonds the 


most probable value of a correlation coefficient is VPP») 
where the p’s are fractions of the whole pool forming the 


variates, and the most probable value of a tetrad-difference 
F, zero. Mackie (19284) showed that the mean tetrad- 
difference is zero, and its variance, for Fy— 


8 = N = i [nps + pops + PiPs + Paps — 20D 


F pypoPa F PiPsPa ar Papapa) + 4D 
a — 2) (i= pi) — p(t — Pal — Pa) 


iz (N — 1) 
where N is the number of bonds in the whole pool. He 
found for the mean value of ry, the value VPID): and for 


its variance 


33 
3 N- 1 
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This is not the variance of all possible correlation 
coefficients, but of those formed by taking fractions p, and 
Pa of the pool. The whole family of correlation coefficients 
will be widely seattered by reason of the different values 
of p, “rich” tests having high correlations, and those 
with low p, low correlations. Mackie (1929) next extended 
these formul to variable coefficients (i.e. bonds which no 
longer were all-or-none). He again found the mean value 
of F to be zero, and for its variance— 


, AN V — 2) f2 2\)2. 2(N — 1) 


Í 9\2)2 
11 0 i 
EAIA 
The presence of Z in this is due to Mackie’s limitation to 
* 


positive loadings of the bonds. Thomson (19350, 72) 
removed this limitation and found— 
AN —1) 
7 

Similarly, Mackie found for variable positive loadings 
(4929) — 


or? 


and for all loadings Thomson found (1935b)— 
23 
N 
Thomson suggested without proof that in general, when 
limits are set to the variability of the loadings of the bonds, 
resulting in a family of correlation coefficients averaging 7, 
these correlations will form a distribution with variance— 


1 4 
2 = —. — 
of 1 — 7) 


and will give tetrad-differences averaging zero with a 
variance— 

3 MNO = 2) [- 2a IN — 1 
a N iy} N: i 


G 


(a 72)? 
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Summing up, Thomson says (19350, 77-8): “ The sam- 
pling principle taken alone gives correlations of all values 
and zero tetrad-differences if N be large. Fitting the 
sampled elements with weights . . . if the weights may 
be any weights . .. destroys correlation when N is infinite. 
This means that on the Sampling Theory a certain approxi- 
mation to ‘all-or-none-ness’ is a necessary, assumption 
not to explain zero tetrad-differences, but to explain 
the existence of correlations of . . . large size. . The 
most important point in all this appears to me to be the 
fact that on all these hypotheses the tetrad-differences tend to 
vanish. This tendency appears to be a natural one among 
correlation coefficients.” 

A tendency for tetrad-differences to vanish means, of 
course, a still stronger tendency for large minors of the 
correlational matrix to vanish. In more general terms, 
therefore, Thomson’s theorem is that in a complete family 
of correlation coefficients the rank of the correlation matrix 
tends towards unity, and that a random sample of variates 
from this family will (in less strong measure) show the 


same tendency- d 
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264, 349. 

Householder, 377. 


Independence of units, 331 ff. 
Indeterminacy, 357. 
Inequality of men, 315. 
Inner product, definition, 74. 
Irwin, 372. 

Iterative methods, 119. 


Jeffery, 40, 359, 374. 


Kelley, 17, 46, 112, 217, 283, 373. 
Kent, 185. 


Lacey, 122. 

Landahl, 161, 373. 

Latent root, 113, 124, 265, 350; 
definition, 110, TH 

Lawley, 124, 275, 373; maxi- 
mum likelihood method, 127 fl., 
333, 360. 

Ledermann, 52, 82, 166, 168, 227, 
245, 284, 293, 338, 354, 357, 
358, 360, 367, 379 

Limits, to extent of factors, 167; 
to number of factors, 121 fl. 

Lindquist, 208, 378. 


Loadings, definition, 68, 333; 


and factors, reciprocity, 267, 
361; negative, 139, 261, 331 ; 
standard errors, 134; see also 
Saturations. 


Mackie, 323, 347, 367, 368, 373. 

McNemar, 122, 374. 

Matrix, calculation of reciprocal 
209, 354; definition, 8, 144. 
double centred, 268 ; Landahl, 
162; multiplication, 112, 145 ; 
notation, 346 ; orthogonal ro- 
tating, 144; “ powering,” 112; 
yank of, 48, 63; special or- 


thogonal, 149 ; trace of, 337; 
see also Latent root. 

Maximum likelihood method, 
127 ff., 360. 

Medland, 90, 374. 

Metric, 329. 

Minors of a determinant, 64. 

Monarchie doctrine, 312. 

Moods, analysis of, 261. 

Mosier, 121, 374. 

Moul, 40, 359, 374. 

Multiple correlation, computa- 
tion, 199, 206, 216 ; definition, 
197; with factors, 223; with 
g, 10. 

Multiple-factor analysis, 63 ff. 


Negative loadings, 139, 261, 331. 
Normal distribution, 35 fl. 
Normalized scores, definition, 6. 
Normalizing, 159. 

Notation, 345, 346. 

Nuclear clusters, 28. 


Oblique factors, 170 fl., 341, 362 ; 
estimation of, 245, 365. 


Oligarchic doctrine, 312. 
92, 189 fis 


Orthogonal axes, 
151 ff. 

Orthogonal simple structure, 
151 ff. 
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Orthogonality, test of, 171. 

Otis-Kelley, correction for selec- 
tion, 283. 

Oval diagrams, 11, 69, 77, 308. 


Parallel 
169. 

Parsimony of hypothesis, 15, 389. 

Pattern and structure, 170, 176, 
833, 362. 

Pearson, K., 40, 275, 296, 359, 
360, 374. 

Peel, 219, 353, 374. 

Physical measurements and hier- 
archical order, 825. 

Pivotal condensation, 65, 201, 
209. 

Pooling square, 197 ff., 350; and 
centroid analysis, 216. 

Price, 317, 374. 

Primary factors, correlation with 
reference vectors, 188 ; defini- 
tion, 170, 176, 363. 

Principal components, Hotelling, 
107, 349; acceleration by 
powering, 112; in common 
factor space, 117; compared 
with maximum likelihood load- 
ings, 135; computation, 108 ff., 
264; Kelley’s method, 112; 
significance test, 124. 

Probable error, 36. 


proportional profiles, 


Raath, 184, 374. 

Rank of a matrix, 48; defini- 
tion, 65; low reduced, 318; 
reduced, 48; unchanged by 
selection, 291. 

Reciprocal matrix, calculation, 
209. 

Reciprocity of loadings and fac- 
tors, 267 ff., 339, 361. 

Reduced rank of a matrix, 48. 

Reference tests, 45. 
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Reference vectors, correlation 
with primary factors, 188; 
definition, 170, 176 ; structure 
on, 333, 362. 

Regression, 195 ff.; equation, 
851; estimates of factors, 353, 
365. 

Regression coefficients, compu- 
tation, 205, 211, 226, 227, 855 ; 
correlation of, 211; covari- 
ances of, 211; geometrical 
picture, 212; partial, 199; 
standard error, 203, 211; 
variances of, 211. 

Reliability, coefficient, 37; 
weighting for battery, 219. 
Residues, 44; significance tests, 

120 fl., 131. 

Reyburn, 122, 146, 183, 
341, 374. 

Richness of tests, 310, 330. 

Rosner, 91. 

Rotation of axes, 139 ff., 330; 
Alexander, 79, 140; by ex- 
tended vectors, 157; Landahl, 
162; Ledermann, 166; or- 
thogonal, 347; to orthogonal 
simple structure, 154 fl.; 
Spearman, 100; Thurstone, 
103. 


184, 


Sampling error, 33 ff., 121; of 
tetrad differences, 40, 359. 

Sampling theory of ability, 
807 ff., 330, 348, 367 ; Dodd, 
371. 

Saturations, 8; see also Load- 
ings. 

Second-order factors, 186 ff., 365. 

Selection, and communalities, 
283; and factor loadings, 
284 ff.; geometrical picture, 
276; and matrix rank, 291; 
multivariate, 294 ff., 360; 
Otis-Kelley formula, 283 ; and 


INDEX 


partial correlation, 282; and 
simple structure, 303; uni- 
variate, 275 fl.; and variance 
of differences, 281. 

Sheppard, 95. 

Sign changing in centroid anal- 
ysis, 71, 106. 

Significance, of correlation resi- 
dues, 120 fl., 131; definition, 
36, 121; of principal com- 
ponents, 124. 

Simple structure, criticisms, 182 ; 
Horst, 372; and independence 
of units, 331 ff. ; Ledermann’s 
method, 166 ; orthogonal, 151 
fl.; rotation to, 154. 

Singly conforming tests, 55. 

Soper, 360. 

Space, common-factor, 102, 117, 
337; ellipsoidal, 93; spher- 
ical, 97. 

Spearman, passim, 374. 

Specific factors, 8, 335; maxi- 
mized, 120, 312. 

Standard deviation, definition, 5. 

Standard error, of correlation 
coefficients, 39; of loadings, 
134; of a tetrad-difference, 
40, 859; of variance, 88; of 
2, 40. 

Standardized scores, definition, 
6. 

Stephenson, 17, 18, 49, 249, 251, 
259, 261, 871, 875. 

Structure and pattern, 170, 176, 
338, 362. 

Sub-pools of the mind, 313. 

Swineford, 122, 875. 


Taylor, 122, 146, 183, 841, 374. 

Tetrad-differences, definition, 12; 
distribution, 41; and sampl- 
ing theory, 311, 367; stan- 
dard error, 40, 359. 
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Thompson, J. R., 366, 375. 
Thomson, passim, 375. 

Trace of a matrix, 337. 
Thurstone, L. L., passim, 376. 
Thurstone, T. G., 877. 

Tryon, 20, 26, 377. 

Tucker, 121, 377. 

Turnbull, 345. 

Two-factor theory, 5 ff., 847. 


Unique communalities, condi- 
tions for, 83. 


Variance, absolute, of tests, 326, 
329 ; analysis of, and multiple 
correlation, 208; definition, 
5, 11; of differences in sam- 
ples, 281; of estimated fac- 
tors, 238 ; of factor, 78 ; and 
oval diagrams, 11; of regres- 
sion coefficients, 211 ; of sam- 
ples, 37. 

Vectors, definition, 102; ex- 
tended, 156, 172; reference, 
170. 

Verbal factor, 15. 

Vernon, 122, 377. 

Vocational advice, 231, 355. 


Weighted battery, Spearman’s 
weights, 10. 

Wherry, 203, 877. 

Wilson, 83, 123, 227, 354, 377. 

Wishart, 40, 859, 377. 

Worcester, 83, 123, 877. 

Wright, 317, 377. 


Yates, 132, 372. 
Young, 377. 
z-transformation of correlations, 
39. 


